In this paper we introduce and discuss our approach to creating an object model from a ..... potentially useful for a new application? Typically, application ...
Chapter 16 USING ONTOLOGIES TO CREATE OBJECT MODEL FOR OBJECT-ORIENTED SOFTWARE ENGINEERING Dencho N. Batanov 1 and Waralak Vongdoiwang 2 1
School of Advanced Technologies, Asian Institute of Technology, Klong Luang, Pathumthani, 12120, Thailand; 2University of the Thai Chambers of Commerces, Din Daeng, Bangkok, 10400, Thailand
Abstract:
In this paper we introduce and discuss our approach to creating an object model from a problem domain text description as a basic deliverable of the analysis phase in Object-Oriented Software Engineering using ontologies. For this purpose we first briefly compare object models with ontologies. The object model of a system consists of objects, identified from the text description and structural linkages corresponding to existing or established relationships. The ontologies provide metadata schemas, offering a controlled vocabulary of concepts. At the center of both object models and ontologies are objects within a given problem domain. The both concepts are based on reusability using intensively libraries. The major difference is that while the object model contains explicitly shown structural dependencies between objects in a system, including their properties, relationships and behavior, the ontologies are based on related terms (concepts) only. Because ontology is accepted as a formal, explicit specification of a shared conceptualization, we can naturally link ontologies with object models, which represent a systemoriented map of related objects. To become usable programming entities these objects should be described as Abstract Data Types (ADTs). This paper addresses ontologies as a basis of a complete methodology for object identification and their modeling as (converting to) ADTs, including procedures and available tools such as CORPORUM OntoExtract and VisualText, which can help the conversion process. This paper describes how the developers can implement this methodology on the base of an illustrative example.
Key words:
Object Model; Knowledge base
Ontologies;
Software
Engineering;
Object-Oriented;
462
1.
Raj Sharman, Rajiv Kishore and Ram Ramesh
INTRODUCTION
Ontology is a specification of a representational vocabulary for a shared domain of discourse: definitions of classes, relations, functions, and other objects (Gruber, 1993) or, more generally, a specification of conceptualization (Gruber, 1994). The basic components of an ontology are concepts, relationships between concepts and attributes. Concepts, relationship types and attributes are abstracted from the objects and thus describe the schema (the ontology). On the other hand, the objects populate the concepts, values and relationships, instantiate the attributes of those objects and relationships among them respectively. Three types of relationships that may be used between classes or concepts in ontology are generalization, association, and aggregation. Ontology is well known as a structured description of declaration and abstract way to express the domain information of an application (Angele, Staab & Schurr, 2003). The concepts in an ontology are similar with objects in object oriented software engineering. To solve the problem of heterogeneity in developing software applications, there is a need for specific descriptions of all kinds of concepts, for example, classes (general things), the relationships that can exist among them, and their properties (or attributes) (Heflin, Volz, and Dale, 2002). Ontologies described syntactically on the basis of languages such as eXtensible Markup Language (XML), XML Schema, Resource Description Framework (RDF), and RDF Schema (RDFS) can be successfully used for this purpose. Object models are different from other modeling techniques because they have merged the concept of variables and abstract data types into an abstract variable type: an object. Objects have identity, state, and behavior and object models are structural representation of a system of those objects [based on concepts of type, inheritance, association, and possibly class (ChiMu Corporation, 2003)]. In the artificial intelligence (AI) area, ontology has been focused on knowledge modeling. On the other hand, a lot of industry standards and powerful tools for object-oriented analysis, design, and implementation of complex software systems have been developed. And because of the closed connection between ontologies and object models, these maturing standards and tools can be used for ontology modeling (Cranefield & Purvis, 1999). Object orientation is a commonly accepted paradigm in software engineering for the last few decades. The motto of object-oriented software development may be formulated in different ways, but its essence can be stated simply: “Identify and concentrate on objects in the problem domain description first. Think about the system function later.” At the initial analysis phase, however, identifying the right objects, which are vital for the system’s functionality, seems to be the most difficult task in the whole
Ontology Handbook
463
development process from both theoretical and practical point of view. Object-oriented software development is well supported by a huge number of working methods, techniques, and tools, except for this starting point object identification and building the related system object model. Converting the text description of system problem domain and respective functional requirement specifications into an object model is usually left to the intuition and experience of developers (system analysts). One commonly accepted rule of thumb is, “If an object fits within the context of the system’s responsibilities, then include it in the system.” However, since the members of a development team are likely to have different views on many points, serious communication problems may occur during the later phases of the software development process. Recently there has been great research interest in applying ontologies for solving this "language ambiguity problem" as either an ontology-driven or ontology-based approach (Deridder & Wouters, 1999). This is especially true for object-oriented software engineering mainly because of the similarity in the principles of the two paradigms. Moreover, the object systems similar to ontologies, which represent conceptualized analysis of a given domain, can be easily reused for different applications (Swartout, 1999). Representation of objects as Abstract Data Types (ADTs) is of primary importance in developing object-oriented software because it is actually a process of software implementation of ADTs. Any ADT is a named set of attributes, which show the characteristics of and formalize the relationships between objects and methods (operations, functions) for putting into effect the behavior of objects, making the system functional enough to be of practical use. Building an accurate, correct and objectively well-defined object model containing objects, represented as ADTs, is the basis for successful development of an object-oriented software system (Weiss, 1993; Manola, 1999). The basic idea is that the implementation of ADTs as a code allows all working objects (instances of classes) to have one and the same behavior, which can be changed dynamically in a centralized manner for higher efficiency and effectiveness. Objects are transformed during the software development process from “real things” to concepts, and finally to Abstract Data Types, as shown in Fig. 16-1. Real Thing
Concept STUDENT - Person who is studying in an academic system
Abstract Data Type STUDENT - Attributes with their types - Behavior (methods, operations, functions)
Figure 16-1. Conceptualization and ADTs
464
Raj Sharman, Rajiv Kishore and Ram Ramesh
Our approach to converting text description into object model, described in this paper, is based on eight different models, only two of which, namely the text description model (T-model) and class (object) model (C-model), are included in the classical object-oriented software development process. The rest of the models used represent specific analysis work, which the developers should do, to get benefit from using ontologies for semi-formal identification of objects, which are to be responsible for the system functionality and their respective ADTs. The basic idea is to ensure suitable transformation of the models from one to another using respective procedures and tools, which can be considered as potential elements for integrating ontologies into CASE tools for object-oriented systems. The paper is structured as follows: Section 2 compares the similarities and differences between object models and ontologies in modeling, languages used, implementation, and reusability. Section 3 presents the overview of the approach to converting text description using ontology to object model. Section 4 introduces the models used and describes the overall procedure for their transformation. This section also shows the techniques and tools, which can be practically used for model transformation. An illustrative example of a part of the information system for the domain of academic management is used throughout the paper to support the explanations. Finally, in section 5 conclusions and some recommendations for future work are outlined.
2.
SIMILARITIES AND DIFFERENCES BETWEEN ONTOLOGIES AND OBJECT MODEL
2.1
Modeling
2.1.1
Ontologies Modeling
In this section, we present two commonly employed ontology modeling approaches. The first is formalized using mainly four kinds of components: classes, slots, facets, and instances. • Concepts/classes: the concepts in ontologies are arranged in an inheritance hierarchy. For example, university domain ontology might contain the concepts Student and Master_Student and the relationship between them is Master_Student is_a Student. • Slots: slots represent the attributes of the classes. Possible slot types are primitive types (integer, boolean, string, etc.), references to other objects (modeling relationships) and sets of values of those types (Knublauch H. 1999). For example, each Student has a name slot of type string.
Ontology Handbook
465
• Facets: they are attached to classes or slots and contain meta information, such as comments, constraints and default values. For example, in order to verify the status of a Student, a Student has to enroll at least one Course. • Instances: instances represent specific entities from the domain knowledge base (KB). For example, the knowledge base of the university ontology might contain the specific student John and course Database. Another approach to modeling of ontologies is formalized using five kinds of components: classes, relations, functions, axioms and instances (Corcho, Fermandez, & Perez, 2001). • Classes: classes are usually organized in taxonomies. Sometimes, the taxonomies are considered to be full ontologies (Studer, Benjamins, & Fensel, 1998). Classes or concepts are used in broad sense. Concepts can be anything about which something is said and, therefore, could also be the description of a task, function, action, strategy, reasoning process, etc. • Relations: they represent a type of interaction between concepts of the domain. They are formally defined as any subset of a product of n sets, that is: R: C1 x C2 x ... x Cn. For instance, binary relations may include: subclass-of and connected-to. • Functions: they are a special case of relations in which the n-th element of the relationship is unique for the n-1 preceding elements. Formally, functions are defined as: F: C1 x C2 x ... x Cn-1 x Cn. An example of function is update, which gets lists of students and titles of their theses, then returns one list of names with added titles. This function belongs to object Thesis. • Axioms: they are used to model sentences that are always true. They can be included in an ontology for several purposes, such as defining the meaning of ontology components, defining complex constrains on the values of attributes, the arguments of relations, etc., verifying the correctness of the information specified in the ontology or deducing new information. • Instances: they are used to represent specific elements of classes. As we can see the two approaches to modeling ontologies are quite similar. The difference is in the included elements, which lead to lightweight (the first case) or heavyweight (the second case) ontologies and respective differences in the details of available output information. 2.1.2
Object Modeling
Object modeling is formalized using mainly four kinds of components: Objects/classes, attributes/properties, methods/operations/functions, and relationships/relations.
466
Raj Sharman, Rajiv Kishore and Ram Ramesh
• Object/class: everything known in the real world. For example, a university system might contain the objects Student, Course, Registry, Assignment, etc. Every class serves as a pattern of describing its instances (objects). • Attributes/properties: objects must have at least one attribute. Possible slot types are primitive types (integer, boolean, string etc.), references to other objects (modeling relationships) and sets of values of those types. For example, each Student has identification number as an attribute and its type is string. • Methods/operations/functions: they are declared and defined on the object’s attributes and may contain meta information, such as comments, constraints and default values. For example, in order to verify the status of a Student (a Student has to enroll at least one Course), a method/function get_course() can be declared and defined respectively. • Relationships/relations: they represent an abstraction of a set of association, that hold systematically between different objects in the real world. For example, the relationship between object Student and object Course is take_course; on the other hand, object Course has relationship taken_by with object Student. The major types of relationships are classification, generalization/specification, associations, aggregation. The brief comparisons between ontology and object modeling shows close similarity especially in the elements (components) used. However, the most substantial difference is in the deliverables. While ontologies represent well structured description (explanation) of mutually related terms, the object model is to represent a system as structure of modules (objects) ready for implementation in software. This is actually the basic differences between ontologies and object models, which remains valid for all following comparisons.
2.2
Languages
Object-oriented modeling is used in a growing number of commercial software development projects. There are many languages that have been considered as mature and well accepted by the community of object oriented software developers. Examples of some of the most popular of them are Ada95, C++, Java, C#, Eiffel, Smalltalk, etc. A large number of languages have been used for ontologies specification during the last decade. Many of these languages have been already used for representing knowledge in knowledge-based applications, others have been adapted from existing knowledge representation languages. There is however also a group of languages that are specifically created for the representation of ontologies such as Ontolingual[1], LOOM[2], OCML[3],
Ontology Handbook
467
FLogic[4], etc. Recently, many other languages have been developed in the context of the World Wide Web (and have great impact on the development of the Semantic Web) such as RDF[5] and RDF Schema[6], SHOE[7], XOL[8], OML[9], OIL[10], DAML+OIL[11], etc. Their syntax is mainly based on XML[12], which has been widely adopted as a standard language for exchanging information on the Web. More specifically SHOE, whose syntax is based on HTML[13], RDF and RDF Schema cannot be considered as pure ontology specification languages, but as general languages for the description of metadata in the Web. Most of these “markup” languages are still in a development phase, they are continuously evolving. Fig. 16-2 shows the main relationships between the above mentioned languages (Corcho, Fermandez & Perez 2001). In addition, there are other languages that have been created for the specification of specific ontologies, such as CycL[14] or GRAIL[15] (in the medical domain).
Figure 16-2. Pyramid of Web-based languages (Corcho, Fermandez & Perez, 2001)
In summary, the languages used for object-oriented software development and describing ontologies are definitely different. The former are programming languages while the latter are declarative by nature. This leads to the requirement specific methods and techniques to be used if we need to work with information from ontologies for the purposes of object orientation.
2.3
Implementations
The languages are used to help implementation of object models and ontologies. The basic approach for implementing objects is to have a class, which serves as a pattern for creating instances (objects). A class defines what types the objects will implement, how to perform the behavior required for the interface and how to remember state information. Each object will then only need to remember its individual state. Although using classes is by
468
Raj Sharman, Rajiv Kishore and Ram Ramesh
far the most common object approach, it is not the only one. For example, using prototypes is another approach, but it is considered as and is really peripheral to the core concepts of object-oriented modeling (Mohan & Brooks, 2003). The Object Data Management Group (ODMG) Object Model is intended to allow portability of applications among object database products. In particular, the ODMG model extends the OMG core to provide for persistent objects, object properties, more specific object types, queries and transactions. The basic concepts are objects, types, operations, properties, identity and subtyping. Objects have state (defined by the values of their properties), behavior (defined by operations) and identity. All objects of the same type have common behavior and properties. Types are objects so may have their own properties. A type has an interface and one or more implementations. All things are instances of some type and subtyping organizes these types in a lattice. A type definition can declare that an extent (set of all instances) be maintained for the type (Cranefield & Purvis, 1999). One of the more recent implementation developments of ontologies on the Web is known as the Semantic Web (Volz, Oberle, & Studer, 1999). The Semantic Web is an extension of the Web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Two important technologies for developing the Semantic Web are XML and RDF which mainly offer an appropriate syntax for ontologies description. XML allows users to add arbitrary structure to documents without saying what these structures mean. RDF allows meaning to be specified between objects on the Web and was intentionally designed as a metadata modeling language. An important aspect of the Semantic Web is a set of ontologies.
2.4
Reusability
Reusability is the most desired goal of the entire software development cycle and is based on the reluctance of reinventing something when it has already been invented. Object-oriented development supports reusability, especially through the principles of abstraction. Inheritance supports continual iterations of analysis until unique objects are found in a class hierarchy. These unique objects inherit characteristics from the higher level classes and this allows reusing information from the previously defined classes, eliminating the need to reinvent it. Class structure leads to the development of class libraries that allow sharing of models and programming throughout a system. The development process can be simplified, from analysis to requirements to implementation, through the use of building blocks of classes and objects (Thomason, & Sauter, 1999).
Ontology Handbook
469
Examples of some of the most popular commercial libraries are Booch Components[16] (Ada95, C, and C++ versions), KISS – [Keep It Simple Series of Generics, by Osiris], Object Space C++ Libraries, etc. As the number of different ontologies is arising, the maintaining and reorganizing them in order to facilitate the re-use of knowledge is challenging. If we could share knowledge across systems, costs would be reduced. However, because knowledge bases are typically constructed from scratch, each one with its own idiosyncratic structure, sharing is difficult. Recent research has focused on the use of ontologies to promote sharing. An ontology is a hierarchically structured set of terms for describing a domain that can be used as a skeletal foundation for a knowledge base. If two knowledge bases are built on a common ontology, knowledge can be more readily shared, since they share a common underlying structure (Swartout, Patil, Knight, and Russ, 1996). Ontology plays an important role in knowledge sharing and reuse. Ontology library systems are an important tool in grouping and re-organizing ontologies for further re-use, integration, maintenance, mapping and versioning. An ontology library system is a library system that offers various functions for managing, adapting and standardizing groups of ontologies. It should fulfill the needs for re-use of ontologies. In this sense, an ontology library system should be easily accessible and offer efficient support for re-using existing relevant ontologies and standardizing them based on upper-level ontologies and ontology representation languages. An ontology library system requires functional infrastructure to store and maintain ontologies, an uncomplicated adapting environment for editing, searching and reasoning ontologies, and strong standardization support by providing upper-level ontologies and standard ontology representation languages (Ding & Fensel, 2001). In the computer science area, ontologies aim at capturing domain knowledge in a generic way and provide a commonly agreed understanding of a domain, which may be reused and shared across applications. Assuming an ontology library is available, how can we select ontologies from this library that are potentially useful for a new application? Typically, application development is in fact a combination of ontology construction and reuse. Some ontologies already exist and can be taken from a library. Others might not be provided by the library, and has to be constructed from scratch. Some available important ontology library systems are WebOnto[17], Ontolingual[1], DAML Ontology library system[18], SHOE[7], Ontology Server[19], IEEE Standard Upper Ontology[20], OntoServer[21], ONIONS[22], etc. Ontologies allow the specification of concepts in a domain, which are actually objects. Shared ontologies allow for different systems to come to a common understanding of the semantics of participating object (Mohan & Brooks, 2003). Central is the reuse role of library of ontologies. Such library
470
Raj Sharman, Rajiv Kishore and Ram Ramesh
contains a number of existing definitions organized along different levels of abstraction in order to increase sharing and minimize duplicate efforts, (Ding & Fensel, 2001).
3.
OVERVIEW OF THE APPROACH TO CONVERTING TEXT DESCRIPTION TO OBJECT MODEL USING ONTOLOGY
Our approach is based on transformation of models. Models are inseparable and one of the most significant parts of any methodology. They help developers to better understand complex tasks and represent in a simpler way the work they should do to solve those tasks. Object-oriented analysis of a system under development is a good example of such a complex task. The complexity stems from the fact that in object-oriented development everything is based on objects but their identification in a given problem domain is completely left to the intuition of the developer. All that he/she has as a starting point is the text description of the problem domain, which is itself an extended model of the usually very general and ambiguous initial user requirements. Following the existing practice we accept this text description (T-model) as available model, which serves as a starting point of our transformation process. According to the object-oriented software development methodology the analysis work on the T-model leads to two major deliverables: functional specification of the system, expressed as either text or graphically as Use Case diagrams and the object (class) model (we call it C-model). The ultimate goal of the developer's efforts is actually the creation of the C-model. This is because the objects included in the C-model should contain the complete information necessary for the next phases of design and implementation of the software system. In other words the objects should be represented as ADTs – ready for design and implementation software modules. It is clear now the problem with "language ambiguity" – different interpretations of the T-model, without any formal support of the choice of participating objects, would lead to creating C-models, which are quite probably inconsistent, incomplete or inefficient for the further steps of design and implementation. We believe that using ontology as a tool of conceptualization working on the T-model can make semi-formal, if not fully formal, the process of creating the C-model and in this way help developers in this complex and imprecise task. This is the major motivation of our work described briefly in this paper.
Ontology Handbook
471
5
2 1 T-model .......... ......... The doctoral student must normally have ........ Text description model
O-model