Domain Modeling with Integrated Ontologies: Principles ... - CiteSeerX

1 downloads 0 Views 186KB Size Report
tor Award IRI-9257578. We thank Yuval Shahar,. John Gennari, Zaki Hasan, Amar Das, and John. Nguyen for valued discussions and assistance on this project.
Domain Modeling with Integrated Ontologies: Principles for Reconciliation and Reuse Aneel Advani, MD, MPH, Samson Tu, MS, and Mark Musen, MD, PhD Section on Medical Informatics, Stanford University School of Medicine Stanford University, Stanford, CA 94305-5479 In recent years, modeling knowledge in reusable domain ontologies has become a central focus in the knowledge-representation community. We have constructed an integrated medical domain ontology for an expert system for guideline-directed care that uses distributed problem-solving components and their individual task-oriented ontologies. We discuss the extent to which the task-committed portions must be modified and reconciled for them to be incorporated in the global domain ontology. We elucidate a design process consisting of three principles for merging domain ontologies that satisfy the requirements of reuse and sharing among distributed problem-solving components: (1) constrain the design process such that a common controlled vocabulary is used; (2) address the interaction problem of integrating consistent task-specific semantics to the global domain ontology by adding frame-based attributes, rather than mappings between vocabulary elements; and (3) make use of multiple-parent hierarchies with subsumed ontologies to maximize the modularity of the domain ontology. We thus show that to combine individual component ontologies we must modify them using a coherent set of principles to facilitate their reuse in an integrated ontology. INTRODUCTION In recent years, the knowledge-representation community has shifted its emphasis from modeling knowledge confined to specific reasoning processes to mod1 eling knowledge in reusable domain ontologies. Indeed, the existence of domain ontologies, which are the declarative conceptualizations of terminology and knowledge in the domain, requires that we be able to distinguish knowledge from the reasoning process that will use that knowledge. The implication of this modeling view of knowledge representation is that knowledge declared in a domain ontology can be used across different tasks or subtasks.2 However, ontologies from task-specific components cannot be automatically integrated into global domain ontologies without modification or adaptation. Constructing reusable domain ontologies, therefore, is not straightforward: there is a tradeoff between specific use and reuse. That is, the more committed an ontology is to a specific domain and task, the less its

terminological elements can be generalized and reused in other domains and tasks. Gruber3 and Guarino4 use the term ontological commitment from the theory of logic to mean choosing and sharing vocabulary elements coherently and consistently across tasks. Many authors have defended the concept of reusable ontologies as a worthwhile and reachable goal in knowledge representation.5 Several of them have attempted to show that domain ontologies can be constructed for specific systems in which the ontology is reused across alternative tasks or applications.6 In this paper, we discuss the problems of reuse and commitment in our work on the domain ontology for EON, an architecture for integrating distributed problem-solving components for management of guideline-directed medical therapy7 (Figure 1). The problem-solving components that make up the EON architecture include a therapy-planning component that takes as input a clinical protocol description and patient data and generates a situation-specific therapeutic plan using the episodic skeletal plan refinement (ESPR) method. 8 Another component, called RƒSUMƒ, 9 does the temporal abstraction and reasoning. Each of the components had its own method and domain-specific application ontology. A central problem in the EON project was how to create a global ontology that could be reused as an integrated and independently maintained ontology for the entire EON system. We thus had to find how to reconcile the concepts in the component ontologies that were used in these two component problem solvers. From our experience in constructing the EON domain ontology, we have developed three principles which elucidate elements of a domain-modeling process which shares and reuses ontologies maximally across different tasks. The first principle is that it is advantageous to constrain the design process such that a controlled medical vocabulary is used to populate the elements of the common domain ontology. Next, we present a strategy to deal with the interaction problem,10 a name Bylander and Chandrashekeran have given to the domain-modeling problem to reflect the conflicts in the use of ontologies prepared for different tasks. The second principle is that we can most efficiently address the interaction problem by adding frame-based attributes rather than mappings between vocabulary elements in each ontology. Third, we use

EON Domain Model Architecture

Instantiated Knowledge Base

Patient Data Model

knowledge acquistion

Integrated Domain Ontology Planner Ontology RƒSUMƒ Ontology

clinical data

Controlled Domain Vocabulary

Figure 1. The EON system provides the architecture for coordinating the function of problem-solving modules and a database mediator with that of a medical knowledge base derived from a global domain ontology. The integrated domain ontology is composed of three different components: the therapy planning method ontology, the temporal abstraction componentÕs method ontology, and the domain-specific vocabulary. a principle of maximum modularity in the design of the central domain ontology by using multiple-parent hierarchies with subsumed domain- and task-specific ontologies. USE OF A CONTROLLED VOCABULARY One area where we faced reconciling conflicting ontological commitments while integrating the existing domain ontologies of the therapy planning (ESPR)

Figure2a. The concept of a low hemoglobin is named ANEMIA in the controlled terminology as shown in the ProtŽgŽ/Win Ontology Editor window.

Figure 2b. In the RƒSUMƒ component, we replaced the class HEMOGLOBIN_STATE with the class ANEMIA derived from the controlled vocabulary. (The Ontology Editor window has been cropped here.)

and temporal abstraction (RƒSUMƒ) problemsolving methods was in their use of different vocabularies. Since these two ontologies had been used separately in previous generations of our system, the problem of how the named elements of a particular domain ontology represented concepts, a syntactic question, became paramount. For example, in RƒSUMƒ, which takes primitive clinical variables and forms intelligent abstractions from their values, the atomic name used for the states abstracted from the parameter HEMOGLOBIN was HEMOGLOBIN_STATE. The values of this class would be LOW or HIGH depending on whether the course of the patient's hemoglobin was consistent with anemia or erythrocytosis as determined by the temporal abstraction method. However, the therapy planning component of our model used an actual value of this state abstraction−namely, ANEMIA−as an atomic term in its terminology. Certainly, as far as guideline authors are concerned, the word anemia is more naturally used in medical parlance to represent the concept of a state of low hemoglobin. Thus, the question for the domain modeler was how to reconcile the tokens that represented obverse commitments to the same concept in the two different ontologies. To resolve the conflict we employed a design process that precluded the problem. This process required the use of a controlled terminology as the superset of domain-specific terms used in our system's domain ontology. In this case, the controlled terminology 11 In we used was the T-Helper system vocabulary. this vocabulary, the word anemia was the atomic name associated with the state of low hemoglobin (Figure 2a). We thus replaced the class name HEMOGLOBIN_STATE with ANEMIA (Figure 2b). (We then satisfied the semantic needs of the temporal-planning component by changing the attributes of the class ANEMIA to include information that the class represents a low state of the HEMOGLOBIN parameter. We discuss the ontological commitment of adding semantics to the controlled vocabulary in the next two sections.) The idea of choosing a relevant subset of a larger controlled terminology to represent terms in a reusable ontology is not new.12 It stems from a design principle that restricting the process of domain modeling is an important part of creating reusable domain ontologies. Placing restrictions on the domain model development to use a larger terminology requires that these large-scale vocabularies exist. It is, however, only recently that these larger controlled terminologies have begun to inform the software engineering process.13 THE INTERACTION PROBLEM

Conflict in ontological commitments also arises in the interaction problem, which involves the relations of the different task-specific ontologies of the problemsolving method components in the system to one another and to the global domain model. When creating a domain ontology that will be reused in conjunction with more than one problem-solving method, we must link in the domain model terms with the task-specific terms in each of the problem-solving methods. Thus, there is a continuum between the method ontologies and the domain ontology that must be modeled consistently. For example, the temporal-planning component was designed to use a subclass relationship to elaborate the value of a parameter within a given context. Specifically, in the diabetes domain, the state of hypoglycemia (an abstraction of the glucose parameter) would be defined differently in the contexts of insulin administration and of no insulin. In the RƒSUMƒ component, these two different definitions are given in different subclasses under the class GLUCOSE_STATE: GLUCOSE_STATE_NO_INSULIN and GLUCOSE_STATE_INSULIN that represent the glucose state in these two contexts (Figure 3a). The RƒSUMƒ component created an implicit context hierarchy as an extension of the parameter ontology using context names appended to the parameter names. However, in the therapy-planner ontology, the concept of a different context extending the names of classes did not exist. Thus, RƒSUMƒ's use of a shorthand name to define a contextualized subclass was another case of a method-specific commitment that we had to reconcile within the overall domain ontology. Retaining RƒSUMƒ's strategy, however, would have lead to long parameter names that were not in the controlled vocabulary such as GLUCOSE_STATE_INSULIN_NPH_TYPE1_HOSPITALX. We would have defeated the purpose of having an ontology that was reusable across potential additional tasks, such as building a user interface. Therefore, we had to find another solution to this problem. We considered two plausible solutions. The strategy we chose was to augment the class definition of the class HYPOGLYCEMIA (which was the replacement for GLUCOSE_STATE using the controlled terminology) to include an attribute, or slot, which contained the relation between the class and the contexts in which the class was to be interpreted (Figure 3b,c). Then we constructed another IS-A hierarchy of contexts names themselves, so that the context references could refer to a structured set of contexts, such as INSULIN_REG and INSULIN_NPH. We were thus able to create different instances of knowledge needed

Figure 3a. The RƒSUMƒ component uses appended token names in subclasses to represent the context-specific concepts.

Figure 3b. This integrated ontology removes the subclasses but adds context reference slots.

Figure 3c. Part of the frame for the class HYPOGLYCEMIA is shown. Note the highlighted slot for context-specific instances of the class. for different subtasks. In addition to handling examples such as having diabetes in different insulin contexts, we could incorporate knowledge about different cutoffs for laboratory values in different institutions, and knowledge of different prescribing preferences of particular physicians in an institution. An alternative solution would have been to have direct mapping between the vocabulary of one task to that of another task. For example, GLUCSE_STATE would have a slot with HYPOGLYCEMIA as the value and vice-versa. This approach was found to be undesirable for two reasons. First, it did not respect the use of the controlled terminology to minimize conflicting ontological commitments associated with the same or overlapping concepts. The difference in meaning between the HYPOGLYCEMIA and GLUCOSE_STATE would have created a conflict analogous to that between ANEMIA and HEMOGLOBIN_STATE discussed above. Second, the mapping problem was a highly intricate task. The resulting domain ontology would have to be checked for inconsistencies, and for truth maintenance during knowledge acquisition. This problem would grow polynomially with reuse of the domain ontology for more problem-solving components, such as a user-interface component, because a new mapping would have to be created from each new component to the existing components.

Our solution, in contrast, gave us a way to simplify the construction of the domain ontology by making the contextualized subclasses in RƒSUMƒ different instances of the main context-free class. In essence, we traded simplicity and reuse of the domain ontology for a more complex knowledge acquisition. But this tradeoff becomes an advantage if knowledge acquisition is automated. Indeed, this benefit is precisely the reason domain modeling and knowledge acquisition have been separated in the methodologies for automated knowledge acquisition such as ProtŽgŽ.14 MAXIMIZING MODULARITY We could, however, use the mapping solution for those parts of the ontology for which there was no conflict due to the interaction problem. There was none in areas of the ontology where the general domain terms and task-specific concepts were not in conflict when used by more than one task. When this situation arose, the most efficient way to design the domain ontology was to take advantage of the disjunction and to partition the ontology along that dimension. This solution is similar to the GAMES-II approach, which characterized concepts in terms of domain specificity and method specificity and modularized around these axes.15 The strategy allowed us to reuse one subset of the ontology in another through the use of multiple parents. For example, the ontology for the temporal-abstraction component was designed such that it had the domain independent parts of the ontology at the top of the hierarchy and domain-dependent parts on the lower levels. Thus, the STATE_ ABSTRACTION class required specific children, such as HYPOGLYCEMIA in the diabetes domain (Figure 4). The term HYPOGLYCEMIA, was however, listed under the class LAB_PROBLEM in the controlled terminology ontology and in the therapy-planning ontology. Therefore, allowing multiple parenting so that HYPOGLYCEMIA could be a subclass of both LAB_PROBLEM and STATE_ABSTRACTION allowed us to mesh the tree structures of the two method ontologies when the concepts in each method ontology did not have conflicting meanings. Thus, we reduced greatly the problem of a divergent ontological structures of the component ontologies by using multiple parents and modularization. CONCLUSION The design principles that we used in constructing our global domain model evolved from the different requirements for integrating the components of the EON architecture. We learned that the criteria for reconciling conflicts in domain modeling throughout the architec-

Figure 4. This is example of multiple parenting. The class HYPOGLYCEMIA is defined under two parents, originally as a child of Lab_problem and again as a child of STATE_ABSTRACTION. ture had to arise out of a coherent view of the domain-modeling process itself. This process required assuring the syntactic, semantic, and structural integration of the component ontologies. In particular, we learned that the domain modeling had to begin with a controlled medical vocabulary to be used in the entire system. This vocabulary ensured that the commitment of each task to a named concept was internally consistent. Moreover, it allowed for the potential use of a larger standardized lexicon, such as 16 SNOMED, which could ensure external consistency with a wider set of applications. The next step in the process, that of adding semantics to the controlled vocabulary also required that we consider ontological commitment and reuse. We found that the interaction problem could be addressed if we were willing to increase the complexity of the classes of the component ontologies in our framed-based system by extending their frames to include reconciling information rather than by mapping between conflicting classes. In essence, we showed that we can simplify domain modeling even if the amount of knowledge acquisition based on the domain model cannot be reduced. Moreover, we can reduce the complexity of the class hierarchy in the domain model by increasing the number of the specific instances of those classes in the knowledge base. Finally, we showed that we could add to the structural integrity and reusability of the domain model by increasing the modularity of the global ontology using multiple parenting and subsumed ontologies. The issue of reconciling the different ontological commitments in the ontologies of distributed components in a system has only recently begun receiving serious attention.17 There is now a great chasm between the availability of numerous reusable problem-solving methods and relative paucity of good reusable medical domain ontologies. Since the development of reusable domain ontologies is now the most costly and rate-limiting step in the construction of working knowledge-based systems, the problem of reconciliation and reusability of disparate ontologies has become more acute. We have attempted to identify design principles that specifically allow us to solve the problem of integrating inconsistently com-

mitted subsets of a global domain ontology so that the goal of reuse can be achieved. Acknowledgments This work was supported in part by grants LM05708 and LM05305 from the National Library of Medicine. Dr. Musen is also supported by NSF Young Investigator Award IRI-9257578. We thank Yuval Shahar, John Gennari, Zaki Hasan, Amar Das, and John Nguyen for valued discussions and assistance on this project. References 1

Laresgoiti I, Anjewierden A, Bernaras A, Corera J, Th. Schreiber A, Wielenga BJ. Ontologies as vehicles for reuse: a mini-experiment. In Gains BR, Musen MA, eds. Proc of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW96), Banff, Canada, 1996:30-1 - 30-21. 2 Gennari JH, Tu SW, Rothemflush TE, Musen MA. Mapping domains to methods in support of reuse. IJHCS 1994; 41:399-424. 3 Gruber TR. Toward principles for the design of ontologies used for knowledge sharing. IJHCS 1995; 43:907-928. 4 Guarino N, Carrara M, Giaretta P. Formalizing Ontological Commitment. In Proc of Natl Conf on AI (AAAI-94), 1994. Seattle: Morgan Kaufmann. 5 Guarino N. Understanding, building, and using ontologies. In Gains BR, Musen MA, eds. Proc 10th Banff KA Workshop, Banff, Canada, 1996:29-(1-13). 6 Swartout B, Patil R, Knight K, Russ T. Toward distributed use of large-scale ontologies. In Gains BR, Musen MA, eds. Proc 10th Banff KA Workshop, Banff, Canada, 1996:32-(1-19). 7 Musen MA, Tu SW, Das AK, Shahar Y. EON: A component-based approach to automation of protocoldirected therapy. JAMIA 1996; 3:367-388. 8 Tu SW, Musen MA. The EON model of intervention protocols and guidelines. In Proc of the AMIA Fall Symposium. Wash DC: AMIA, 1996:587-591. 9 Shahar Y, Musen MA. Knowledge-based temporal abstraction in clinical domains. AI in Med 1996; 8:267-298. 10 Bylander T, Chandrashekaran B. Generic tasks in knowledge-based reasoning: The right level of abstraction for knowledge acquisition. In Gaines BR, Boose JH, eds. Knowledge Acquis for KB Systems. London: Academic Press, 1988: 65-77. 11 Musen MA, Weickert KE, Miller ET, Campbell KE, Fagan LM. Development of a controlled medical

terminology: Knowledge acquisition and knowledge representation. Meth Inform Med 1995; 34:85-95. 12 Bowker G, Star SL. Situations versus standards in long-term, wide-scale, decision-making: the case of the International Classification of Diseases. In: Milutinovic V, Shriver BD, eds. Proc of the 24th Ann Hawaii Int Conf on System Sciences (vol 4). Los Alamitos, CA: IEEE Comp Soc Press, 1991: 73-81. 13 Musen, MA. Domain ontologies in software engineering: use of ProtŽgŽ in the EON architecture. In Proc of the IMIA WG 6 Conf on Nat Lang and Med Concept Representation, Jacksonville, Florida, 1997. 14 Tu SW, Eriksson H, Gennari JH, Shahar Y, Musen MA. Ontology-based configuration of problemsolving methods and generation of knowledgeacquisition tools. AI in Med 1995; 7:257-289. 15 van Heijst G, Falsconi S, Abu-Hanna A, Schreiber G, Stefanelli M. A case study in ontology library construction. AI in Med 1995; 7:227-255. 16 Cote RA, ed. Systematized Nomenclature of Medicine, 2nd ed. Skokie, IL: College of American Path. 17 Degoulet P, Sauquet D, Jaulent MC, Zapletal E, Lavril M. Semantic interoperability in health information systems. In Proc of the IMIA WG 6 Conf on Natural Language and Med Concept Represenation, Jacksonville, Florida, 1997.