Conceptual, Semantic and Information Models for ... - Semantic Scholar

Published in the Proceedings of the 4th European-Japanese Seminar on Information Modelling and Knowledge Bases, 31st May-3rd June 1994. To appear in Information Modelling and Knowledge Bases VI, IOS Press, Amsterdam, 1995.

Conceptual, Semantic and Information Models for Medicine Carole Goble, Sean Bechhofer, Danny Solomon, Alan Rector, Anthony Nowlan and Andrzej Glowinski Medical Informatics Group, Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK tel: +44 61 275 6195, fax: +44 61 275 6932, email: [email protected] Abstract. This paper presents the intensional requirements of the medical record. It suggests that at least four models are required in order to cover the complexity of the medical domain. It goes onto propose that description logics are more appropriate to the modelling of medicine information than conventional prescriptive semantic data models and propose a description logic capable of constrained generation of complex compositional concepts from elementary ones and their relationships. This formalism, named GRAIL, is described in detail as a representation for medical concepts, and is shown to be capable of integrating all the intensional and extensional requirements of a medical record. Experiences with GRAIL in use in a clinical workstation, PEN&PAD, and a large medical terminology server, part of the EU-funded project GALEN, are extensively described. The support requirements of distributed large-scale collaborative modelling for re-use, as required for GALEN, are discussed. Examples, both medical and non-medical, are given throughout.

1. Introduction The information in medical databases is so large, variable and complex as to render simple intensional models ineffective. A schema which relies on static pre-defined entities and domain values and simple semantic constraints is unsatisfactory. In medicine incomplete, inconsistent, contradictory, uncertain, speculative and possibly irrelevant data must be collected in case it becomes useful. Prescriptive conceptual models applied to medical records have led to over-simplification, inappropriate compartmentalisation, and the inability to handle the required levels of accuracy and varying levels of granularity [1]. In traditional paper-based notes a problem might be described in one line or two pages, and this flexibility must be reflected in any electronic medical note system. The medical community has recognised the need to describe unambiguously all clinical conditions in order to facilitate standardisation and data interchange; one of the first coding/classification system was devised in 1749. Such systems attempt to encode and classify each and every possible clinical condition and are either combinatorially explosive or require additional semantic control mechanisms [2].

Published in Information Modelling and Knowledge Bases,4th Proceedings of the Euro-Japanese seminar on Information Modelling and Knowledge Bases, 1994,. IOS Press, Amsterdam

Medical care is administered by many professionals during one episode of illness, let alone throughout a patient’s lifetime. Hence the medical record has to support a wide range of users with different viewpoints. The medical record for a patient must be viewed in a variety of different ways: e.g. encounter-centred; problem-centred; task-centred; temporal snapshots and patient-centred. Various systems have been developed that take one, or two, perspectives and base their conceptual models around that view; attempts to view the information in a different way can be difficult. A possible solution is to choose the most fine-grained and canonical form possible—one observation by an agent on a patient on a given date at a given location. This “log of observations” is, perhaps, a more accurate reflection of reality. However the “story of observations” may well be temporally independent of the real state of the patient, hence the model of the medical record requires fine-grained temporal attribution, but should also reflect the atemporal persistence of some clinical conditions—e.g. diabetes is a disease for life. It is essential to support not only the clinician’s history-taking task but also the reconstruction task needed to reveal the patient’s story as against the order it was told in, and to support retrospective enhancement of observations separate from the original [17]. 1.1. Four Intensional Models for the Medical Record We suggest that in order to accurately support the medical record at least four interrelated models are required: a system of medical concepts, the tasks, the actors (e.g. clinicians and patients) and the medical record structure itself. We concur with [28] that more than one model is required to capture the full requirements, and that these models must interoperate but may well have different structure, content, purpose and dimensions. The four models are shown diagrammatically in figure 1. A brief description of the four models is given here; see [17] for a further discussion. Medical Information System Conceptual Model TASKS schema

constrains

MEDICAL CONCEPTS schema

ACTORS actor specific schema

constrains

constrains

constrains MEDICAL INFORMATION MODEL

constrains

structural schema

MEDICAL RECORD patient instances

Figure 1. The four models of a patient’s medical record

The CONCEPTS model describes the clinical terminology and information model of the medical record and decision-making tasks. This is equivalent to Brachman's T-box, but 2


certain assertions are included as the classic T-box is too restrictive [3]. Simply, “what is it possible to say”. The TASKS model represents the process of care—implicit and explicit goals, alternative ways of achieving these, activity models, pre-defined protocols, plans and criteria for making choices. How repeated measurements are entered into the physical record is part of the task of recording information; that patients ought to have their blood pressure taken repeatedly is part of a model of the process of care. This is similar to software system specifications. The ACTORS model represents non-observational information such as preferences and the ways in which the actors interact with each other and with other models. This model is related to the adaptive user models in knowledge engineering systems, where autonomous agents actively maintain a user profile. Here the actor model does more by further restricting the intension of the medical record extension to make it actor-specific. The MEDICAL INFORMATION model represents the structure of a patient’s record, e.g. sessions, encounters and complaints. The information model for the medical record, which describes the structural intension of the collection of instances making up the patient’s extension, is separate from the model of the concepts it uses [4]. Simply, “what happened”. The CONCEPTS model constrains what the other models may represent and conversely the other models have requirements for the system of concepts. It can be divided into static and dynamic ontologies. The static ontology includes statements such as ‘arthritis is a disease’. In addition the system must be semantically restricted, both statically (eyes cannot fracture) and dynamically (a disease can be a diagnosis or a problem needing treatment). This restriction must be achieved without having to enumerate every clinical concept with all possible modifiers, as the number of such concepts is certainly large and possibly infinite, depending on the precise model. The ACTORS model represents instances of specific actors (Dr Smith and Mrs Jones) as well as schema information restricting the medical record structure and content specific to those actors, defining the assertional applicability of CONCEPTS, TASKS and MEDICAL INFORMATION models to the individual. The CONCEPTS model states that only women can have children, but it is the ACTORS model that indicates that the patient instance is a man which will restrict the medical intensional schema making the model concerning pregnancy inapplicable. The INFORMATION MODEL of the medical record must allow an authentic account of clinicians’ understanding and not force clinicians to premature commitment. This permits statements that can be conflicting, uncertain (clinicians need to record their doubt about symptoms, diagnoses, etc.), negative observations such as excluded diagnoses, and arbitrary levels of detail. These allow the clinician to record the patient’s story, in the order of its telling, with all its contradictions, and without retrospective amendments, only enhancements. The information in the medical record itself is not about what was ‘true’ of the patient, but about what was observed and believed by clinicians. We can make inferences about what was ‘true’ on the basis of these observations with greater or lesser confidence depending on the circumstances, but these are only inferences. As in [29] truth and opinion cannot be confused. This leaves us with the requirement for atemporal and temporal abstractions oriented around a patient or any other object (e.g. a disease or clinician) and implies that the model of the medical record has two parts: a series of direct observations and meta-observations about those direct observations [4]. In addition the CONCEPTS model should be application independent and able to support reuse. Bubenko in [30] also suggests that eight different “worlds” or submodels are needed in requirements acquisition, where four are represented in Figure 1: objectives and activities

3


(different granularities of TASK), actors (ACTORS), things (CONCEPTS and INFORMATION). Figure 1 of [30] has similarities with our figure 1 here. In this paper we report on a compositional yet constrained classification formalism, GRAIL1 . In section 3 we describe GRAIL as a representation of a medical CONCEPTS model and give a number of medical and non-medical examples. In section 4 we describe how GRAIL can be used to represent all four models and patient data: thus being both conceptual schema and database. In section 5 we report on our experiences of its use in a prototype clinical workstation, PEN&PAD, and our experiences of developing large medical models using the formalism for a networked Terminology Server. We also report on the shortcomings of the representation and discuss some issues that have arisen. We conclude with a brief summary. 2. Semantic Data Models for Medicine The semantic data models [5] are ‘content-oriented’ and try to capture the ‘real world’. Concrete objects like ‘patient’ are relatively straightforward to model, however abstract objects like ‘disease’ or ‘diagnosis’ are not. Many attempts have been made to use Semantic Data Models (SDM) such as Entity-Relationship Modelling to represent the electronic patient record [6] and such techniques are used by standards bodies. However such models tend to solely concern themselves with the MEDICAL INFORMATION model and not the others, in particular not the CONCEPTS model. SDMs such as EER [7], NIAM[8] and even OO[9] describe objects, their relationships and their constraints in a static, prescriptive way—the concepts modelled are those set down by the designer in the structure the designer has fixed upon. In order to model concepts we must be able to combine elementary concepts to produce new compositional concepts without having to model every single possible combination of events, conditions etc. that can occur. These new concepts would be objects in their own right and yet inherit the properties of their ancestors. A descriptive SDM would have the flexibility and expressibility of a free-text and yet be structured and controlled for standardisation, data interchange and recombination into different summaries and views. The classification-based terminological languages (also known as description logics) in the KL-ONE family [10] such as CLASSIC [11], BACK [12] and LOOM [13] look more promising as SDMs for the CONCEPTS model. Such semantic net/frame based systems are compositional and generative defining complex entities in terms of composite descriptions made up of a limited set of elementary concepts. As such we are able to model dynamic ontologies and continue to evolve our static ontologies. A description logic, GRAIL, and its earlier incarnation Structured Meta Knowledge (SMK), has been developed in Manchester by the Medical Informatics Group since 1988. It is a generative representation with subsumption and multiple inheritance based on semantic networks. Because of its powerful generative capabilities the number of elementary concepts that must be explicitly held can be limited to a small number, making the model sparse. We have used GRAIL and SMK to integrate medical concepts, a medical record model, patient instances, actors and tasks into a single formalism, with the knowledge base of medical terminology providing the semantic control over the patients’ medical records. GRAIL is under development as part of the EU Advanced Informatics in Medicine initiative GALEN2 1 2

GALEN Representation And Integration Language Generalised Architecture for Languages Encyclopaedias and Nomenclatures in Medicine, EU AIM project 2012

4


which is developing a Terminology Server to support the development and integration of clinical systems through a range of key terminological services, built around a language independent, re-usable and shared systems of medical concepts. For detailed information on the terminology server see [18]. We will introduce GRAIL as a formalism capable of representing medical concepts (medical terminology with some assertional characteristics), going into some detail regarding its operation and capabilities. We will then discuss its application as a unified representation for the other intensional models and the extensional medical record itself. 3. GRAIL: a Model for Medical Concepts GRAIL defines complex entities in terms of composite descriptions made up of a limited set of elementary concepts and assembled according to explicit rules [2,14]. In contrast to some representations certain constructs are excluded (notably existential quantification) and others restricted, in particular universal quantification, disjunction and negation. Certain types of defeasible descriptions and default statements are included but tightly controlled. The function of GRAIL is to represent statements that allow the expression and validation of all and only semantically correct descriptions. GRAIL describes a subsumption network consisting of simple, elementary entities, bi-directional binary relationships linking concepts and ‘particularizations’ which are new composite entities implied by the descriptive relationship. Particularizations are placed in the subsumption hierarchy by a classifier based on their structure, and are a form of implied subtype of their base supertype inheriting the properties of their supertype. GRAIL can be viewed primarily as a representation model for supporting the creation of conceptual schemas or ontologies as collections of semantic constraints [14, 31]. GRAIL differs from its KL-ONE relatives in that it has: a) a sanctioning constraint mechanism such that only clinically valid concepts can be combined into descriptions and allowed to enter the medical record (no ‘green headaches’ or ‘fractured livers’). This allows us to generate concepts implied by the model and guarantee their semantic correctness; b) the inclusion of essential assertions, called necessary statements, at the concepts level (‘cancers are always severe’ so a ‘mild cancer’ is a nonsensical concept), hence the CONCEPTS model is not just a model of terminology; c) a canonisation mechanism for ensuring that all equivalent, tautologous and redundant concepts are identified and reduced to a unique canonical form (the ‘left hand’ and the ‘hand attached to the left arm’ are the same object). This is instead of the SAME-AS designation in most languages derived from KL-ONE; d) the co-ordination of partitive hierarchies, transitive relationships and subsumption (classification) hierarchies (‘the shaft of the femur’ is a division of the femur, not a kind of femur’, but a ‘fracture of the shaft of the femur’ is a ‘fracture of the femur’). e) the integration of instances and classes into a three layer model, not the usual two: category (classes), individuals3 and occurrences (instances) We will now introduce the fundamentals of GRAIL by way of examples. GRAIL has been devised for medical applications and we have only made modest attempts to apply it to other applications [19]. We can make no great claim to its usefulness or efficiency in other domains, other than an intuitive belief that medicine exhibits the worse case in data modelling 3

(which do not have quite the same semantics as CLASSIC’s individuals)

5


terms. However, we will accompany our medical examples with simple examples based on a transport system. 3.1. Elementary entities and explicit subsumption Simple, elementary entities are defined by asserting their positions in an explicit subsumption or ‘isa’ hierarchy: 1 BodyPart newSub Bone; Bone newSub Humerus; Condition newsub Fracture 2 Transport newSub Vehicle 3 (SymbolicValueType newSub TransportValueType) newSub [road rail] 4 The new subtypes inherit all the properties of their supertypes in the usual way of an isa inheritance hierarchy (if B isa A, then A subsumes B—the properties of A are a subset of those of B). Multiple inheritance is commonly used. Expression 3 explicitly asserts some value types—SymbolicValueType is a GRAIL primitive concept and road and rail are siblings that are different although we haven’t specified why or how. 3.2. Bi-directional binary relationships Attributes are used to describe binary bi-directional relationships between entities and define new entities. Cardinalities are restricted to the one or many cases: oneMany, manyMany, manyOne and oneOne. We do not have a maximum cardinality value (unlike CLASSIC or CANDIDE [16]), mainly because in the medical domain we haven’t had a need for it and a maximum cardinality can lead to intractability in subsumption [12]. 4 DomainAttribute newAttribute hasLocation isLocationOf manyOne. 5 DomainAttribute newAttribute hasDriver isDriverOf manyOne. 6 DomainAttribute newAttribute hasAge isAgeOf manyOne. Expression 5 defines the attribute hasDriver and its inverse pair isDriverOf; the structure of the relationship is ‘topic-attribute-value’. The topic entity can be related to one value entity through the hasDriver attribute; the value entity can be related to many topic entities. This is similar to the roles asserted in CLASSIC, with a more limited form of the cardinality bounds restriction constructor wound in. Such a relationship can now be used to relate two entities: 7 Fracture hasLocation Bone. 8 Vehicle hasDriver Person. 9 Vehicle travelsOn TransportValueType. Subclasses of the Bone entity are now the ‘role-filler’ values of the role hasLocation when applied to the entity Fracture. 3.3. Particularisations Using attributes we can construct complex composite concepts called particularisations. Particularisations haven’t been explicitly asserted; they are implied by the descriptive relationship between two entities provided by the attributes. They are placed in the 4

() are used to group expressions in the compiler; [] are syntactic sugar in the compiler; these lists are expanded by the compiler before being passed to the underlying system. are used to group criteria for passing to the underlying system as an intact list.

6


subsumption hierarchy by the classifier and are a form of implied subtype of their base supertype, inheriting the properties of their supertype. The expression does not add any new information—they are inferred (automatically) by the intensional schema of medical concepts and hence represent an expansion of what is there already rather than adding to it. The which operator is used to evaluate expressions. 10 Fracture which hasLocation Bone entity is kind of Fracture 11 Bone which isLocation Of Fracture inverse entity is kind of Bone 12 Vehicle which From expression 12 the Vehicle which is different to the elementary entity Vehicle, and is different to the particularisation Vehicle which but is subsumed by both. The expression will be placed automatically by the classifier as a subtype of Vehicle by a process known as formal subsumption, using the structure of the entities and the internal semantics of GRAIL. Thus 12 will inherit all the properties of Vehicle (the particularisation’s ‘base type’). A ‘criterion’ is the name given to the combination of an attribute-value pair, e.g. hasDriver Person. We will return to formal subsumption later. Entities can become complex and rather cumbersome, so GRAIL has a naming mechanism for aliasing entities, making the modeller’s life tolerable and the models more comprehensible. 13 (Vehicle which travelsOn road) name RoadVehicle. 14 ((Person which hasSex female) which hasAge young) name Girl. A particularisation is reified (i.e. explicitly stored rather than evaluated on demand) if it adds new information (e.g. a name) or for efficiency reasons. Particularisations can be thought of as composition aggregations that are automatically situated in a specialisation lattice [31]. 3.4. Controlling the Creation of Particularisations: Sanctioning We need to strictly control generation of particularisations so that only those deemed to be semantically sound within the context of the domain model are successful. We wish to exclude such nonsensical concepts as Fracture which hasLocation Tongue or Vehicle which hasDriver road. GRAIL provides a system of semantic sanctioning constraints that control and restrict the creation of self-consistent and non-redundant particularisations. The sanctioning is layered in order to offer the maximum flexibility in generation and to ensure parsimony in the number of concepts in the model required for generation; each layer further restricts the possible combinations of entities that are permitted. Currently sanctioning has three layers but we see no reason why this could not be extended. Sanctioning replaces the role-value typing mechanism of most KL-ONE-like systems which is not powerful or flexible enough for our requirements. The sanctions are represented by three layers of statements applied to an attribute between two entities. The first consists of grammatical statements; the second consists of statements about medically sense; and the third consists of statements about what is generally thought to be true or has been said to be true in a particular case. Statements in each layer must be sanctioned by the layer above, so all three layers combine with a self-consistency test to define coherence. Each descriptive arc in the semantic network has a qualifier indicating which kind of statement it is.

7


3.4.1. Grammatical Correctness Grammatical statements represent para-linguistic facts, for example that the attribute hasLocation must have a Condition as its topic and a BodyPart as its value. Grammatical statements should not contain significant detailed domain (medical) knowledge, do not indicate whether or not any particular statement is domain (medically) sensible and are used primarily to guide the process of knowledge acquisition by validating statements such as Fracture-hasLocation-LongBone or Vehicle hasDriver-Person and rejecting statements such as Fracture-hasLocation-Penicillin or Vehicle hasDriver-road. 15 Condition grammatically hasLocation BodyPart. 16 Vehicle grammatically hasDriver Person. 3.4.2. Semantic Correctness Sensible statements represent substantive domain knowledge by determining which of the grammatically correct statements are domain ‘sensible’, i.e. which statements are semantically correct. However, they contain no information about what is actually true. The sensible statement Fracture sensibly hasLocation Humerus does not imply that all fractures are located in the humerus, nor that all humeri are fractured. Instead it states that it is semantically correct to talk about fractures of the humerus. The absence of any corresponding sensible statement for eyebrows indicates that it is semantically incorrect to talk about fractures of the eyebrow. 17 Fracture sensibly hasLocation Bone. 18 RoadVehicle grammaticallyandsensibly hasDriver Person. symbol A medical example of a particularisation is shown graphically in Figure 2. The represents the particularisation operator. Diabetes which affects FemoralArtery hasHormoneDependency Insulin is a subtype (not an instance) of the elementary concept Diabetes. It is semantically sensible for Diabetes to affect Arteries (the FemoralArtery is a subtype), shown by the (S)ensible relationship. This in turn was sanctioned (indicated by the dotted arrows) by the syntactic (G)rammatical relationship between Condition and PhysiologicalSystem. Physiological G System affects

Condition

G

Hormone has Hormone Dependency

Cardio-Vascular System Artery

S affects

Diabetes

S

Insulin

has Hormone Dependency

Femoral Artery Diabetes which affects Femoral Artery hasHormoneDependency Insulin

Figure 2. A medical example of a particularisation.

Figure 2 represents two levels of schema. The first is that of elementary medical concepts such as Artery, Insulin and Diabetes, together with relationships between those entities such as components, causality, locality and dependency. This higher level schema implies and

8


controls the existence of composite and more specialised schema concepts, which can in turn have instances in the medical record of a patient. Consequently a patient may have an instance of the concept Diabetes which . The model has a deductive quality in the sense that the elementary concepts and relationships form an intensional ‘object factory’ or metaschema for more specialised concepts generated on demand and usually ‘virtual’. The whole is an intension for instances of the medical record (as we shall see in section 4). In version 2 of GRAIL, cardinalities will be applicable to grammatical and sensible statements, and not just to the general attribute, giving a finer degree of control over the constraint model. 3.5. Necessary Statements—essential assertions in the CONCEPTs model Actuality statements represent the assertional knowledge of what is generally believed to be true—the factual knowledge about pathophysiology and medical treatment. The sensibleactual distinction should not be confused with uncertainty about the truth or falsity of a statement. Three actuality statements occur5 , one of which is the third layer of sanctioning for statements used for situations where a criterion is mandatory but is not part of the definition of a particularisation concept. A stronger statement than ‘it is sensible for hands to be part of arms’ is ‘it is conventionally true that hands are part of arms’. These are called necessary statements and are similar to Brachman and Levesque’s ‘essential’ statements. Such statements are indefeasible and, although they are assertional in nature, they are considered to be part of the the CONCEPTS model extending it beyond being a merely terminological model. We believe this is essential if we are to model medicine effectively. Necessary statements are used in the classification process, and contribute in a major way to the elimination of redundant criteria and maintenance of coherence in particularisations6 . For example, all drivers of road vehicles must be old (19) ; all road vehicles have wheels (20): 19 (Person which isDriverOf RoadVehicle) necessarily hasAge old. 20 RoadVehicle necessarily hasTractionDevice wheels. The evaluation of the particularisation expression Girl which isDriverOf RoadVehicle would fail as the hasAge attribute has a manyOne cardinality, hasAge old and hasAge young (from (14)) would both be present in the result of the expression and would conflict. An expression where the cardinalities are satisfied is considered to be “well-formed”. The definition of a RoadVehicle is that it travelsOn the road, not that it has wheels. Necessary statements describe assertions on an entity but are not part of that entities intrinsic definition. This is Wood’s [21] difference between the intensional definition of a black telephone and the assertion that at all telephones are black. Necessary statements are indefeasibly inherited and represent a strong statement—we have forbidden road vehicles that do not have wheels. We can also have asymmetric necessary qualification where one side is necessary and the other is not (topicNecessary and valueNecessary) which effectively allows us to model the mandatory and optional participation of entities in the attribute relationship [14].

5 6

We will discuss defaults in section 3.12 and factual assertions in section 4 We will discuss this momentarily

9


3.6. Formal Subsumption between Particularisations Before continuing we should summarise the definition of a particularisation. A particularisation is composed of: • a base entity: its superclass elementary entity • a defining criteria set: those descriptions which define its composition and are definitional and indefeasible (the sufficient conditions making up the definition of the concept). • a complete criteria set: those descriptions which are necessarily true of the entity, whether from its definition, from necessary statements or by inheritance from ancestor entities (the necessary conditions which must be true of the concept). For example the particularisation: 21 RoadVehicle which hasFuel petrol. has, given the previous examples so far: base entity: Vehicle defining criteria set: hasFuel-petrol, travelsOn-road complete criteria set: hasFuel-petrol, travelsOn-road, hasTractionDevice-wheels travelsOn-road is from the expansion of the name RoadVehicle, hasTractionDevice-wheels is from a necessary statement inherited from the RoadVehicle particularisation. 3.7. Eliminating redundancy: canonisation Canonisation is a process that ensures that all equivalent, tautologous and redundant entities are identified and reduced to a unique canonical form. By this way we ensure that the ‘left hand’ and the ‘hand attached to the left arm’ are the same object. Consider the expression 22 Girl which hasSex female. This is a well formed and sensible expression. However the criterion hasSex female is unnecessary as this criterion is part of the defining criteria set for the entity Girl (from (14)). The GRAIL classifier can detect such redundancies and reduces the entities to a canonical form—Girl which hasSex female and Girl will be the same entity, with the same internal entity id. Necessary statements play an important role in canonisation. Consider: 23 (RoadVehicle which hasFuel petrol) name Car. 24 (Person which isDriverOf Car) name CarDriver. The entities CarDriver and CarDriver which hasAge-old will be the same, as CarDriver will inherit the hasAge-old from the necessary statement introduced in (19). Canonisation is the most difficult process undertaken by the classifier. The simplest example is that of duplication. The (Problem which isLocatedIn-Bus) isLocatedIn-Bus, is obviously just a Problem which isLocatedIn-Bus. Such redundancies can be removed without knowing where the concept lies in the hierarchy. A more complex situation is when the concept has inherited information that forces redundancy. For instance, if Vehicles are asserted to be powered by fuel, 25 Vehicle necessarily isPoweredBy-fuel. 26 Vehicle newSub Bus. and we attempt to classify a Bus which isPoweredBy-fuel, the concept will be initially classified under Vehicle. At that point, we can see that the Bus will be powered by fuel by 10


virtue of it being a Vehicle, and we should really just be looking at the Bus entity, as Bus and Bus which isPoweredBy-fuel are the same thing. Similarly, if criteria subsume inherited ones, they are removed. If we consider a Problem which isLocatedIn-Bus isLocatedIn-Vehicle, we can see that as a Bus is a Vehicle, the criteria asserting the location in the Vehicle is redundant and can be removed. Canonisation is made more complex by the fact that every attribute has an inverse. This leads to problems with concepts like: 27 Bus which isLocationOf (Problem which hasLocation Bus). We can see (assuming that isLocationOf and hasLocation are inverses), that this is really just the entity 28 Bus which isLocationOf Problem. If an entity is to be in canonical form, all sub expressions must be in canonical form, and all “inversions” should also be in canonical form. This is currently an issue with the GRAIL implementation, as this “canonisation of inversions” is a computationally expensive operation. Another task performed during canonisation is the checking of cardinality. For instance, the hasLocation relation is single valued, as a concept can only have one location. Note that the cardinality can only be correctly checked in concepts in canonical form. If we consider the concept 29 Problem which hasLocation Bus hasLocation Vehicle. this can be seen to have two concepts related via the hasLocation attribute. After canonisation though, as we saw above, the criterion hasLocation-Vehicle will be removed, leaving the single hasLocation relation, and thus a coherent definition. In contrast, the concept 30 Problem which hasLocation Bus hasLocation Train. can be seen to be nonsensical, and will be rejected. There is no redundancy here, and two values or “role-fillers” for the hasLocation relation. In situations where the relation or attribute is multi valued, no such coherency problems arise. The notion of cardinality is closely linked with the issue of canonisation. In the above situation where we had 31 Problem which hasLocation Bus hasLocation Vehicle. It was obvious that the concepts Bus and Vehicle needed to be “joined” together. In situations such as: 32 Bus which isLocationOf Problem isLocationOf Puncture. (where Puncture is a kind of Problem), it is not so clear whether this should simply canonise to: 33 Bus which isLocationOf Puncture. These areas of canonisation are currently being explored. Other processes used in canonisation, such as interiorisation and exteriorisation (the process of moving criteria deeper into or further out of the criteria set) are still in the early stages of exploration. Preliminary work is described in [22]. 3.8. Part-Whole relationships: refinement and transitivity An essential characteristic of medicine is the importance of part-whole relationships—a shaft (the centre of the bone) is a division of the humerus, not a kind of humerus, but a ‘fracture of

11


the shaft of the humerus’ is a ‘fracture of the humerus’ and must be classified accordingly. This is particularly important when modelling anatomy. GRAIL must co-ordinate partitive hierarchies and subsumption hierarchies, and endow the divisionOf relationship with special transitive properties. Subsumption is an obvious example of a transitive relation. If a Bus is a kind of RoadVehicle, and a RoadVehicle is a kind of Vehicle, then a Bus is a kind of Vehicle. Other relations, for instance partitive ones, also behave in this way. If Nut is part of Wheel, which is part of Car, then Nut is part of Car. Refinement is a similar property which describes how relations can interact with one another. The subsumption relation is again an example of one which behaves in this manner. If a Bus has a driver Man which is a kind of Person, then the Bus has a driver Person. By this we mean that the concept 34 Bus which hasDriver Man. is subsumed by the concept 35 Bus which hasDriver Person. Partitive relations also exhibit this behaviour. If an Engine is part of a Bus, and we have a Problem which is located in the Engine, then it’s also located in the Bus. The concept 36 Problem which hasLocation (Engine which isPartOf Bus). is subsumed by the concept 37 Problem which hasLocation Bus. The mechanisms to deal with this are bound up in the subsumption rules for criteria. Note that in the above example, Bus and Engine which isPartOf Bus are not related via the subsumption hierarchy, but the criteria formed using the isLocationOf attribute along with these concepts do subsume. GRAIL provides facilities for specifying which relations are transitive or refined along one another which the subsumption algorithms and tests take into account [20]. Hence transitive specialisation as discussed in [31] is extended to deal with the aggregration/decomposition dimension. 3.9. Negation and sanction cancellation GRAIL is purposely restricted to avoid known areas of difficulty in knowledge representation—one of which is unconstrained negation. However, in order to model medicine as accurately as other conventional clinical coding schemes we do require a limited but highly controlled version of negation. Relative complements (e.g. endocrine diseases which are not diabetes or symptoms other than pain) are useful and seem plausible. Negative criteria (e.g. ulcer which has been said not to cause pain) introduce additional complexity. Work is continuing in both of these areas [20, 22]. Absolute complements (e.g. non diabetes) are not contemplated. Sanction cancellation has been explored as a “work around” the absence of negative concepts. The semantics of cancellation is not particularly well understood and their application often leads to overly complicated and incomprehensible models. Cancellation can cause particular difficulties when applied to particularisations in the ACTORS model as we shall see in section 4.

12


3.10. The Classification Process The number of composite clinical concepts that can be dynamically and safely generated is vast. Our experiments based on existing medical terminologies suggest that the number of concepts will be in excess of 1 million. However, connectivity is more important: each concept is classified in at least 4-5 ways making the of the order of 1-10 million edges in the semantic network. In order to manage the number of concepts most are ‘virtual’ and generated on demand. A hierarchy or lattice is maintained, with concepts being place above and below one another according to the subsumption relationship. In order to maintain this hierarchy, when new concepts are encountered or generated, they are classified. The basic subsumption test is given concepts X and Y, X subsumes Y iff: a. X and Y are not equal b. X and Y are in canonical form c. the base entity of X subsumes or is equal to that of Y d. the defining criteria set of X subsumes or is equal to the the complete criteria set of Y, i.e. every criterion in the defining criteria set of X subsumes or is equal to some criterion in the complete criteria set of Y. Complications occur because of transitivity, which we will not discuss here, but are discussed in [20, 22]. Subsumption of criteria is essentially a recursive notion using subsumptions of the constituent entities. The classifier has two functions: 1. To determine the legitimacy of a concept—given a concept X, declare its legality against the schema. Such a request is treated as a question, ‘can I create X and correctly place it into the classification lattice without error’. If the new concept X is successfully located classified then the query is successful. Thus all queries to the GRAIL model are entities and queries are answered by classification (in much the same way as CANDIDE [16]). 2. To generate predictive data from the concepts—given a concept X what is known about this concept from the schema. This function goes further than the first. Not only does it locate the concept it then generates concepts connected to X based on the permissible sanctions and the definition of X. This approach was used for the PEN&PAD clinical workstation described in section 4. The classification process involves three different stages: • Classification—determining a concept’s place in the hierarchy, i.e. its parents and children. • Canonisation—removing redundancies, tautologies and checking a concept for consistency and coherence. • Sanctioning—testing that a concept is legitimate according to the existing sanctions. These cannot be performed independently. Classification and canonisation are inextricably linked. As a concept inherits information such as necessary statements from its parents, we cannot completely canonise a concept until we know its place in the hierarchy. Nor can we completely classify a concept until we have it in a canonised, or ‘canonical’ form. Classification starts with an attempt to remove any obvious redundancy, e.g. duplicated information. In such situations, the redundancy can be removed before the concept’s place in the hierarchy is known. After this initial canonisation, a first attempt at classification is made—this involves locating all the parents of the concept being classified. Children need not be identified at this point—the parents are important as they may endow their children with extra information, needed for canonisation or sanctioning. Children do not add any properties

13


to a concept, so the classifier can wait until all other tasks have been performed before finding them. Finding parents essentially involves a traversal down the subsumption lattice until a concept has been found which subsumes the entity being classified, but has itself no children which subsume it. We have then found an “immediate” parent of the entity, and it is placed in the hierarchy at that point. Once this first attempt at classification has been made, canonisation is performed. The concept may have inherited information from a parent which causes redundancy in the classified concept. If any redundancy is present, it is resolved, and reclassification attempted. Once this cycle of classification and canonisation has finished, the concept will have been placed in its correct place within the hierarchy. We can now sanction the concept. This simply involves checking that each of its criteria has been sanctioned to the correct level, either for this particular concept or for a parent (as sanctions are inherited). Finally, if the above has completed successfully, we identify the children of the concept. This involves a similar process to the finding of parents, where an “immediate” child will be a concept which is subsumed by the classified entity, but has no parents subsumed by it. Various optimisations of the classification algorithm, which is basically a graph traversal, are being investigated, including reducing the search space, caching results, propagating test results and conditions to determine early termination of searches [20]. There are still some problems to be overcome, particularly arbitrary explicit subsumptions between complex entities rather than base entities leads to intractability and for this reason is severely restricted in the current GRAIL implementation. 3.11. Classifying in the presence of necessary statements A ‘necessary’ statement expresses criteria which are necessary but not part of the defining criteria set, as they are not part of the set of ‘sufficient’ criteria. They are used in upward classification but not downwards classification. Consider figure 3, which is our Road Vehicle example revisited. Vehicle

Base: Vehicle Defining: {travelsOn-road} Complete: {travelsOn-road hasTractionDevice-wheels}

X RoadVehicle = (Vehicle which travelsOn road)

Z Vehicle which hasFuel superGas

necessary

Y

Base: Vehicle Defining: {hasFuel-superGas} Complete: {hasFuel-superGas hasTractionDevice-skis}

RoadVehicle which hasFuel superGas

necessary by inheritance

Base: Vehicle Defining: {travelsOn-road, hasFuel-superGas} Complete: {hasFuel-superGas travelsOn-road hasTractionDevice-wheels hasTractionDevice-skis} Figure 3. Classification with necessary statements

14

necessary


When classifying X (or Z) downwards, the necessary statement hasTractionDevice-wheels (or hasTractionDevice-skis) is irrelevant to the classification—only the defining criteria (its sufficiency definition) is important for positioning the concept above Y. However, when upwardly classifying Y beneath X, the necessary statement becomes important. We must include the complete criteria set of Y, including the necessary statements. In this example, as hasTractionDevice is a single valued attribute, Y will fail to be classified. 3.12. Defaults The second kind of assertional statement available in GRAIL is the default statement, which sits beneath the sensible statement in the classifier hierarchy. Defaults do not add calssification knowledge; instead they allow us to assert facts and relationships about entities which have no bearing on their classification but can be inherited by entities. Defaults are intended to be a mechanism for ‘hanging’ external knowledge onto the classification lattice for use by external tools, for example for reasoning. Defaults have been used in the ACTOR and TASK models to represent the tailored user interfaces and to relate the clinical dialogue to the CONCEPTS model. 4. The Unification of Concepts, Information, Actors and Instances Section 3 presented in some detail the GRAIL formalism within the context of the medical concepts model—medical terminology and conceptual assertions forming a schema constraining what can be said in the other parts of the medical information system conceptual model, and forming classes for patient instances of concrete data. GRAIL7 has been used in an attempt to integrate all four of the models described in figure 1 into one unified model using particularisations and sanctioned relationships to hold schema and patient instance data. This ensures that the data in the clinical record is accurate and semantically correct, as well as using the same classification mechanisms for data as for concepts. Such a unified model has been developed for a prototype clinical workstation called PEN&PAD [15]. The unified model of actors, concepts and data has three layers, rather than the usual two: category (i.e. classes), individuals and occurrences (i.e. instances). The model’s entities and particularisations partition into the three spaces, as illustrated in figure 4. category space

MEDICAL CONCEPTS schema

individual space occurrence space

MEDICAL RECORD schema

ACTORS

instances of actors

TASKS

schema schema concepts specific to an actor

instances of the MEDICAL RECORD

Figure 4. The unified three space model of the medical record

7

Strictly speaking GRAIL’s predecessor, SMK, was used for the unified model

15


4.1. Category space Category space models the system of medical concepts, medical record structure, and tasks, constrains the generation of medical concepts and drives the constraint mechanism. It holds grammatical and sensible sanctioning relationships, making it the medical record schema for patient instances. Statements about categories represent the abstract, general or intended behaviour of objects. 4.2. Occurrence space This space models the ‘authentic’ historical part of the medical record. Occurrence space contains instances of the medical schema ‘classes’—i.e. the patient’s medical record itself. Particularisations in this space are data instances of the MEDICAL RECORD called ‘occurrences’. Within the medical record the statement that ‘Jane Smith’s fracture is located in the humerus’ is an actual observation, and represents the second kind of actuality statement in GRAIL. This statement is sanctioned by the statements that ‘the humerus is a sensible location for fractures’, which was in turn sanctioned by the grammatical statement that ‘conditions are located in body parts’. These are like the individuals in CLASSIC, but there is no ambiguity as to concept and instance. 4.3. Individual space Individual space represents concepts that are neither part of the full medical record schema (category-space) nor observations (occurrence-space). In most object-oriented systems there are two levels of abstraction (classes and instances). However, because the medical record is made up of observations localised at a particular point in time and space, GRAIL requires an additional level of abstraction. The individual space represents: (i) concrete instances of actors such as ‘Mrs Jones’ and ‘Dr. Smith’. These are instances of ACTORS category concepts such as Patient and Doctor. (ii)

concrete instances of categories which persist in time and space about the individual patient’s medical record such as Mrs Jones’s diabetes or Dr. Smith’s treatment preferences. These instances are individual particularizations and form part of the ACTORS schema for those actor’s instances (occurrence particularizations). They may include atemporal and aspatial assertions about the patient.

Choices concerning what level of detail should constitute an instance such as though described by Brachman in [23] are avoided by restricting individuals and occurrences only to concrete instances in the real world and their properties. Given that GRAIL allows concepts to be further particularised to an arbitrary degree, effectively continually evolving the concepts schema, such a choice is nonsensical. This is closely analogous to Sowa’s treatment of types (cf. categories) as lamda abstractions over instances and hence fundamentally different to them [24]. GRAIL models the ‘level of specification’ required by separate applications as explicit ‘external’ knowledge about those applications. For example, for statistical purposes “amputation of finger” is an adequate level of specification. For the purposes of directing a surgeon, the laterality and selector of the finger, “the left third finger”, must be specified. These levels of specification are described by meta-attributes which we do not have space to discuss here. Figure 5 illustrates the three spaces. This example is taken from [17]. I1 is an individual instance of a category concept (C1): Patient is part of the MEDICAL RECORD schema and Mrs

16


Jones is an instance of an actor in the ACTORS model. The individual particularisation (I2) is an instance of its base type (C3): Diabetes which isHadBy Mrs Jones is an instance of Diabetes, not a type of Diabetes. In individual and occurrence space particularisations are instances of their base entity, not subtypes of the base entity as in category space. (I2) is both an instance of the CONCEPTS schema and part of the ACTORS schema. The occurrence particularisation (O3) is a MEDICAL RECORD instance of Mrs Jones’ diabetes (I2), not an instance of the concept diabetes (C3). By attaching further constraining relationships to individual (I2) we can represent actor-specific schema concepts, for example cancelling a relationship which, though true of diabetes, is not appropriate for Mrs Jones. All occurrences of Mrs Jones’ diabetes will be instances of (I2), so (I2) can serve as an index into the occurrence space, or an abstraction of occurrence space. The occurrence instances of clinical concepts are always grounded to a date, place, agent and observed individual. (O1) represents the concept of a clinical session—an agent at a place on a date—and (O2) adds an observed individual and represents a clinical encounter. These occurrences are created as they may have assertions concerning them, such as that a piece of equipment was faulty during one particular session. (O1) and (O2) are instances of the MEDICAL RECORD model. Occurrences are fine-grained and once asserted, cannot be deleted, only modified by another occurrence particularisation. This, and the use of occurrence particularisations as instances of individual particularizations, cause problems which are discussed in section 5. The presence of a particularsation in category space merely implies the possibility of its creation. Occurrence particularisations are, however, stronger than this in that they are factual instances of the medical record. Hence they must be promoted to data, rather than remaining as mere possibilities, by an assertion relationship between an occurrence (e.g. O3) and its ‘parent’ occurrence (e.g. O2). These relationships carry a qualifier which states in what form the relationship holds; the values of this qualifier may be yes, query (uncertain) or no. It is therefore sensible to represent statements such as ‘Jane Smith may have diabetes’ or ‘Jane Smith does not have diabetes’. Category space Patient

(C1) isHadBy G S

(C2) Condition (C3) Diabetes

Individual space (I1) Mrs Jones

(I2) Diabetes which isHadBy Mrs Jones

Occurrence space (O1) Dr Smith who isProviderAt Infirmary on 1/4/93

(O2) Mrs Jones who isSeenBy (Dr Smith who isProviderAt Infirmary on 1/4/93)

(O3) Diabetes which isHadBy (Mrs Jones who isSeenBy (Dr Smith who isProviderAt Infirmary on 1/4/93))

17


Figure 5. An example of the three spaces

5. Practical examples of GRAIL in use GRAIL, and its predecessor SMK, were not devised as an academic exercise in description logics or knowledge representation. SMK was devised to control advanced user interfaces for clinical workstations and model an authentic and accurate observation-based medical record. The GALEN project evolved SMK into GRAIL; an application independent medical concept model with medical terminology as the definitional ‘backbone’ of the concept model. We are not concerned with academic ‘worse cases’ or general cases—we are working within an application framework that will colour the evolution of GRAIL’s operations and properties. GRAIL is not designed to explicitly support reasoning systems—it is anticipated that such systems will make use of GRAIL models but be separate with different models for problem solving. 5.1. Using GRAIL in a Terminology Server GALEN uses GRAIL to implement a Terminology Server (TeS) capable of supporting the development and integration of clinical systems through a range of key terminological services, built around a language independent, re-usable and shared systems of medical concepts [18]. Effectively such a server is a mediation service between heterogeneous systems and applications, different medical coding schemes and different natural languages. A single coherent COncept REference (CORE) Model of medical concepts is underway as a foundation for an interlingua, although different applications and user groups may locally extend parts of this model to suit their specific needs, for example by extending the coverage or resolution of the model [28]. Access to this CORE model is encapsulated via a Terminology Server as shown in Figure 6. The CORE model also acts as a repository of concepts for new systems, so that individual applications are relieved of the task of capturing the conceptual knowledge of the clinical domain. Multiple TeS are available to applications, although interaction between different servers depends on adherence to the global shared CORE model. GALEN thus joins other initiatives in knowledge sharing [33], but within a well-defined domain. Eventually groups of applications, developers and sites will co-operate in developing and maintaining more CORE models, which they share and which support their joint efforts. The architecture separates language from knowledge concepts—part of the current modelling effort has been on the identification of concepts from language [29]. Different languages do not necessarily represent the same concepts, making the requirement for multiple, extensible and dynamic ontologies all the more pressing. The CORE model and the GRAIL controlling mechanisms of sanctioning and canonisation mean that models can be developed with coherence but without enforcing uniformity.

18


Application A

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAAAAAAAAAA AAAA

User Group B

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAA

Application C

AAAA AAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA "heart attack" heart attack concept A4128 AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA heart attack AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA concept A4128 AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Multilingual Code Concept AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Module Conversion Module AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA heart attack AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA "heart Module AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA concept AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA attack" AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA GRAIL engine AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA lexicons AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA CORE model AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA code stores AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Language Concept Code AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Figure 6. The GALEN Terminology Server

The CORE model is being collaboratively developed by teams of clinicians distributed throughout Europe. The elementary concepts are currently numbered in hundreds, the knowledge adding constructs (e.g. sanctions) in thousands and the implied particularisations that can be generated in millions. This exercise has a number of significant characteristics which have influenced the usefulness and usability of GRAIL: 1. 2. 3. 4.

The CORE model is large; The CORE model tries to be application independent and therefore reuseable; It is not always clear whether what you wish to say falls into the category of terminological knowledge, or assertional; It is a consortium effort with multiple teams working so model integration and coherence are major difficulties, and a common modelling style managed by a modelling methodology has become essential.

Note here that the modellers are the users—clinicians—and that no attempt is made to intercede with a ‘data modeller’. This is necessary in order to remove a potential chasm of misunderstanding and misinterpretation, but also because the complexity of medicine is only understood by the experts themselves. Thus organisational knowledge [34] can become embedded implicitly in the models, and the requirements capture framework [30] is in danger (or is this an advantage?) of being missed altogether, or remaining implicit. 5.1.1. Visualising the concepts and information models A major difficulty has been in the complexity and size of the models being created. When modelling in a dynamic and highly generative semantic data model which also constrains what is possible to generate, how does the modeller visualise the model they have created? How do they know they have generated sensible concepts? Generating sensible particularisations is the basic operation needed for user interfaces based on data-entry forms and for semantic

19


analysis of natural language. There are issues of cognitive capability of the modellers, user interfaces to support the modelling process and modelling as a community not an individual. Initially we had rather poor visualisation tools for our modellers—as the models have become larger and more complicated it has become clear that we need tools for: • visualising the results of statements—as most of the concepts are implied • interactive input • coping with the scale of the model We have developed tools for visualisation, but have yet to complete tools for input (which is through a text-based compiler) or scoping/context mechanisms for orienting the modeller within the model. Early tools for examining the model emphasised implementation aspects of the structure. Whilst crucial to the successful development of the formalism itself, and invaluable to modellers with some understanding of the underlying implementation, tools with a different emphasis on the model were needed as more modellers were introduced to the environment that abstracted away from the (changing) implementation. The feel of the tools is heavily influenced by Objectworks\Smalltalk (the implementation platform of GRAIL). There are two basic types of operation that can be performed on a model: those which add information to the model and those which simply interrogate the model. The tools essentially follow this split, with a workspace, file editor, file list and retraction tools being “knowledge adding tools” , while the browser and inspector are interrogative. The basic GRAIL operation available is the evaluation of an expression. If the expression is successful and is “knowledge adding”, then that knowledge will have been added to the model. A knowledge adding expression is one with a name assigned to it, or with a necessary statement attached. A retraction tool supports the retraction of statements from the model: operations such as retracting the last statement, or retracting everything up to and including the currently selected statement. This is essential as modelling often uses an iterative approach. The GRAIL Browser supports the examination of GRAIL entities and provides information about the concept hierarchy. The browser is divided into three areas (Figure 7). Hierarchy shows the entity’s place in the model. It operates in one of three possible modes: plain, attribute and generative. ‘Plain’ displays a hierarchy using the ‘is kind of’ relationship. ‘Attribute’ allows selection of the relationship(s) used to display the hierarchy. ‘Generative’ shows the entities generated or implied by the model from the selected entity. Criteria displays defining, inherited and necessary criteria sets. Description displays statements, comments, codes or annotations according to the current user preferences. Figure 7 shows the browser in generative mode. This mode allows entities generated— implied by the model—from the selected entity to be displayed. The list on the right shows the attributes which can be used for generation. Making one or more selections from this list before clicking ‘update hierarchy’ causes appropriate implied entities to be generated and the display to be updated. The generated entities8 appear in the hierarchy as if they had been explicitly asked for individually. The display can be further updated by selecting different attributes whereupon the generated entities are added to the hierarchy, or by selecting a different entity before making an attribute selection. 8 Indicted

by a ‘#’

20


A recent development is a constraint browser (Figure 8) designed to aid the exploration of a model. For example, it allows the interactive investigation and exploration of how particular pairs of entities can be combined: what are the sanctioning statements that link them or their parents. It helps answer the modeller’s question: “how can I relate these concepts?”. The model can be passively clustered and modularised by the use of files containing the text of the statements, which can be filed in and out of the classifier. However, no active mechanisms, such as interpretations of the process of modelling, microtheories or models of expectation exist such as are discussed in [27, 28]. As the models are far to big to hold in a modeller’s head, or display on a screen, the modellers resort to tricks to make the process manageable.

Hierarchy area

Description area

Criteria area

Figure 7. A generative browser

5.1.2. Application Independence and Reuse GALEN aims for an application-independent concept ontology, and strives to impose no assumptions as to the concept ontology’s final use. One of GALEN’s goals is to test the hypothesis that a concept ontology is useful to a variety of different applications. Herein lies a dilemma, however, as it is often difficult to know how detailed a model to produce, without knowing its eventual use (what is the level of specification required). Whether it is possible to produce application-independent models for reuse is a debatable point—the modeller’s bias implicitly affects the ontologies. Le Roux [27] suggests that the ontological commitment on a

21


system in the world is seldom task-independent and that knowledge cannot be separated from its use. He suggests a cyclic iterative model where problem solving models and domain ontologies are used alternatively in model construction, with local and global refinement. Le Roux suggests that ontology sharing and reuse is only possible if the syntax and semantics are characterised such that they can be identified and matched. He uses meta-terms and the types of relations involved in the concept structure supported by a software environment, although much of his attention lies in reasoning systems, and GRAIL is not designed for reasoning. In GRAIL relations are implicit in the definitions of concepts themselves and our structures are more general than Le Roux’s. Although GALEN advocates a common concept model independent of language concepts and application, it does recognise that alternative representations are appropriate to individual applications. GRAIL has the ability to transform between different forms and reencapsulate structures in different ways—thus knowledge sharing and reuse require less rigid adherence to a single form of description and multiple ontologies are achievable. For example, for a mechanical service on a vehicle, the garage may require detailed concepts for the equipment used, the measurements taken etc separately, whereas the administrative view is a single concept of service. Clinical applications vary as to whether they wish to record ‘the ulcer’ or ‘the process of ulceration’; ‘fracture of the femur’ or ‘femur which is fractured’. GRAIL concepts can easily participate in multiple ontologies such as structural, part-whole, causal etc with appropriate support through the classification, sanctioning and canonisation processes discussed in section 3. Furthermore, GRAIL has a notion of ‘level of specification’, refered to in section 4.3, where application-specific meta-attributes determine the level of detail that a GRAIL particularisation need go for it to be sufficient for that application, so that no further elaboration of the description is required. This is similar to the ‘detail’ dimension described by [28] and gives us some measure of the ontological commitment of the model. 5.1.2. Modelling in a generative environment The requirement to make the representation of sensible concepts highly generative and parsimonious puts an extra strain on the modelling effort. In particular it has proved challenging to walk the line between over constraining and under constraining the model. An over-constrained model means that it is not possible to generate concepts that are required. An under-constrained model allows the generation of nonsensical concepts. It was only when we had developed the generative browser that the modellers could see the extent of the nonsense they could generate, just by visual inspection. There are multiple ways of defining, altering or extending models with lesser or greater levels of consequent restriction and expressivity, and the process is iterative [20]. In section 1 we discussed the necessity of a tasks model in the collection of models required to interact with the concept and information models for the medical record. Here too a task model is required but within the context of the development of concept models rather than constraining and controlling their use. We require tools with a task model that understands the modelling task itself and that can offer context for the different facets of the model.

22


Figure 8 GRAIL Constraint Browser

5.1.3. The need for a methodology of modelling Our experiences in building the very first model cannot be generalized as such to the future massive production of additional fragments, because the building of the model from scratch involved the deep understanding of a large amount of preliminary issues, the design of many experiments, leading to provisional decisions, and finally the establishment of a set of rules and solutions. As in modern software methodologies, explicit tasks and documented log of the decisions allow quality control and fruitful interaction of different people in the same process— enterprise modelling as described by [30]. In a large scale production (as the one envisaged by GALEN) the various skills required to develop the CORE model (medical, representational etc) are distributed among a team of people co-operating in a given site on the same fragment of the global model, and many teams working on different fragments in parallel; in addition applications will locally extend their own models. A methodology is required to support the control over this co-operative decentralised development; it has to facilitate the discovery as early as possible of the potential overlaps, the resolution of the inconsistencies and to ensure that the model is an acceptable representation of the domain. We have an example of the ‘MultiModelling’ environment advocated by [28] and the commensurate requirement for capturing the dimensions of the models, although GRAIL does not concentrate on the telelogical aspects, and we are not convinced that medicine can be easily divided into the epistemological types identified by [28].

23


As part of our experiences in developing the CORE model we have identified a number of different approaches (modelling tasks) which must be supported by the methodology and by the tool set as suggested in [27, 30]. Sources (and requirements?) are from a corpus of conventional clinical coding schemes, text books, clinical notes but also the experience of the clinical modellers themselves. • Exploratory experiments (tests, verifications, new approaches) • Structure-based development • Systematic production • Scattered additions on the existing model • Deep piecewise revision The approaches correspond to different goals of the modellers and different times in the project lifecycle (but also depending on the degree of formalisation of the sources available in the given subject field and level of detail requested by the applications). See [25] for a full discussion of the methodology. 5.1.4. Verification of the concept models Verification falls into two categories: • Verifying the model is a true representation of the domain; • Verifying that the model is formally coherent. The first of these is a major exercise in involving external bodies, applications testing, workshops etc. This is a particularly tricky issue in medicine, where the modellers’ subjective viewpoints of the domain become embedded in the model. The second, checking that the model is self-consistent without orphan concepts or inaccurate classifications, requires a thorough reclassification of all concepts in the knowledge base. The difficulty is in the recognition of potential incoherent or ambigious configurations (generalisations of “Nixon Diamonds”). The task is not merely to recognise that an ambiguous concept has occured, but that one could be generated based on the existing model. This requires repeatedly partitioning the network into subsets based on the information carried on each edge. Incremental and partial strategies are under investigation, but the total analysis is potentially large though bounded in practice. To this end we have recently embarked on a project using a massively parallel platform and exploiting the inherent parallelism in the classification algorithms in order to realistically undertake the integration and verification of large models. A major goal of the project is to empower peripheral users to extend and modify the concepts model for their own purposes. Fast verification would allow distributed parallel development in multiple electronically linked centres with communities of users extending a common concepts model as part of the common infrastructure for developing coherent systems which can interwork and be integrated. 5.2. Experiences of the Unified Model: PEN&PAD A prototype clinical workstation, PEN&PAD, has been implemented, based around GRAIL’s predecessor, SMK, and the three space model. It has been prototyped in Objectworks\Smalltalk™ and mapped to the relational database Sybase™. PEN&PAD has been evaluated against existing GP systems at a number of surgeries, during real consultations. The instances of medical records contained considerably more detail than those in the doctors’ more conventional systems in that they told accurate and detailed stories. 24


PEN&PAD uses predictive data entry [15] as its input and display method. For a given clinical concept the system generates a data entry form containing all of the likely options for modifiers and additional statements. The function of the schema is to determine what it is sensible to say in any given context and hence must be sufficiently expressive to capture all of the significant information in the medical record. The process of predictive data entry may be thought of as the clinician asserting or refuting belief in potential statements thrown up by the CONCEPT model and modified by the patient’s ACTOR model. SMK category space proved to be an effective approach to generatively modelling medical concepts. Modelling all four schemas—the system of CONCEPTS, TASKS, ACTORS and the MEDICAL RECORD—using a single formalism allowed a tight integration of the four; it has proved possible to impose constraints from one schema on any other. This has led to a powerful, tailorable clinical workstation. This integration led to greater complexity in the models and tools for visualising and manipulating these connected schemas and they were correspondingly more difficult to specify and implement, and were quite slow. The clear distinction between observations and individuals, and between what is observed and what is true, has come as a direct result of building a medical record to be used by many different healthcare workers. Representing the medical record as separate spaces for individuals and observations is a powerful idea and allows us to successfully restrict the CONCEPTS, TASKS and MEDICAL RECORD schema to be actor-specific. However, there are problems in relating new constraints on individuals to observations of older data and the semantics of individual space are somewhat overloaded and can be pushed too far. This, and the semantics of occurrence space, is discussed in detail in [17] and is briefly presented here. 5.2.1. The Semantics of Individuals Individual space, and individuals, arose as (i) a distillation of unattributed, atemporal data about actors and (ii) a way of representing actor-specific constraints on the more general category space. It has both the roles of: a) indices and abstraction summaries for the occurrence instances, however this ‘index’ turns out to be at a best discursive or conceptual index, but not an existential one (i.e. as summarising data instances about the subsumed occurrences of the actors. An individual exists because some occurrence observation has been recorded concerning it, and that observation may not have been positive. For example, an observation of Mrs Jones Diabetes is that she hasn’t got it. As occurrences carry conflicts, we may have an occurrence that Mrs Jones has Diabetes attributed to one doctor and Mrs Jones doesn’t have Diabetes attributed to another doctor, but both will be subsumed by the individual Mrs Jone’s Diabetes. Individual space is good for ‘scene setting’ the occurrence space ‘story’ (by placing observations in context) but bad for story reconstruction because of its lack of temporal or actor attribution. Aggregation requires work on the temporal and constraint modelling, more assertional knowledge and the enrichment of individual space. b) as actor-specific schema as individual particularisations can restrict the TASKS, CONCEPTS and MEDICAL RECORD models represented in category space by ACTOR-specific constraints. A patient’s record is recorded in the order it is told, not the order in which it happened. Observations about a patient restrict what may be said in future encounters causing difficulties when making retrospective statements. For example, cancelling the concept of Mrs Jones’s leg after an amputation makes all past observations of Mrs Jone’s leg and all future observations that retrospectively enhance a past observation of Mrs Jones leg unclassifyable and hence illegal.

25


5.2.2. The Semantics of Occurrences Occurrences are particularisations representing instances of the MEDICAL RECORD model. In the unified model they are treated in the same way as category and individual particularisations, however they turn out to have quite different semantics. Particularisations in category space are implied by the schema in that they are virtual and may be created automatically. They do not add to the schema. Particularisations in individual space are implied unless they are being used as actor-specific schema in which case they are extending the schema and so must be asserted. Particularisations in occurrence space, however, are used to hold the data for medical records and so must be asserted as data rather than implied concepts. 5.2.3. Meta-observations Individual space has difficulty with retrospective occurrences, disagreements between clinicians and the uncertainty of observations. Meta-observations solve the problem of grouping a number of related observations together. Once these meta-observations have been introduced, it is no longer necessary to try to group related observations using the subsumption hierarchy. However, should we attempt to generate from occurrences and metaobservations a reconstruction of the current state of the patient dynamically, or should we have a ‘state cache’ which is has different semantics to the individual space? It is possible that these constraints, currently modelled as meta-observations, should not be represented as a part of the data model, but should instead be represented as rules within a higher part of the system. 5.2.4. The Task Model In figure 4 we attempted to integrate the four models of figure 1 into the three layer model of GRAIL. We would claim to have had some success with regards to all the models except the TASK model. PEN&PAD attempted a full unification by using GRAIL as a way of expressing clinical dialogue, user interface and interaction tasks. Condition-sensitive (rather than actor sensitive) data entry forms were achieved and their content specified in the TASKS model using Data Entry Activity Protocols (DEAPs) which annotated the CONCEPTS model via default statements. The operation of the interface and the choice of data presented could be tailored to be actor-specific by default assertions in individual space. However, we ended up with a spagetti of assertional default statements and hardwired code. It is difficult to represent strict orderings of functions (first do X then do Y), conditional expressions and ‘wrappers’ for structural part-whole task definitions. Attempting to create all the potential occurrences of any requested concept, trying to classify them and then allowing the user to pick which entities should be included as occurrence instances guarantees correctness with respect to the model, and honours any differences due to the particular actors involved. However, there is a danger of naively generating several hundred occurrences, many of which may be unwanted. Our experiences confirmed the major requirement of a declarative TASKS model but the jury is out as to whether GRAIL was the appropriate representation to use or whether we should integrate with a problem solving model as in [27]. As part of the GALEN project the Structured Clinical User Interface (SCUI) is intended to collect clinical information of the infectious disease domain at Geneva Hospital. It is to be used to structure clinicians’ interaction with the systems so that pertinent information is requested and entered. The SCUI 26


uses the GALEN Terminology Server, a medical concepts model described in GRAIL and a dialogue model held in GRAIL as default statements. The SCUI’s data instances are not modelled in GRAIL. 5.2.5. Value Space in PEN&PAD Values, such as ‘Severity’, ‘140/90’ or ‘15 The Street Manchester’ are incorporated in all three spaces: we wish to include measurements as part of the concepts model (Cancer is severe), the actors model (Mrs Jones lives at 15 The Street Manchester) and within the patient data (Mrs Jones had a blood pressure measurement of 140/90 on 11/5/93). Values can be compositional particularisations which raises some difficulties in their classification (in particular their requirement to be unique) and clutters the category space. They also have no subsumptions. This leads us to suggest that we need a further, specialised, space to deal with values. 6. Summary We have suggested that medicine requires multiple and interrelating data models of far greater complexity than conventional semantic data models, and have proposed a generative knowledge representation formalism, GRAIL, which is more pragmatically suitable for the domain than those currently proposed in the knowledge base systems and Semantic Data Model literature. The controlled generative nature of the formalism means that a sparse model can expand into a comprehensive schema capable of covering all medical concepts without explicitly asserting each and every concept. Although we have highlighted some difficulties we have come across when using GRAIL, in particular the cognitive complexity when building large models, we have built a working prototype. PEN&PAD and the SCUI have effectively automatically generated data entry forms based entirely on the use of implied particularisations and sanctioning constraints in GRAIL. A large scale demonstration of a multi-lingual knowledge base is being built as the heart of the GALEN project. GRAIL may not be perfect but we believe that it is a significant improvement over other approaches in the medical arena, and it tries to tackle some difficult issues head on. It can do this because it has been designed to deal with a clear real application domain. Four members of the GRAIL team are medics, which has been important when we have been blown off track by worrys that we haven’t found the general solution to some knowledge representation problem, for example we don’t deal with general negation—in medicine there is no requirement for it that can’t easily modelled around. We realise that we don’t have to be perfect; we just have to be good enough and we are much better than what is available now. Future work includes a formal specification of GRAIL and the support of data aggregation across patient records. Aggregation requires work on the temporal and constraint modelling, more assertional knowledge and the enrichment of individual space. New work will investigate the use of parallelism to support the model verification exercise and as a fast platform for a corporate predictive data entry service. Major work lies in the provision of a distributed large scale collaborative modelling support environments and methodologies.

27


Acknowledgements The authors would like to acknowledge the other members of the Medical Informatics Group who have contributed to the work discussed in this paper. This research is supported in part by the United Kingdom Medical Research Council grant number SPG 8800091, the Department of Health, and the European Community under the Advanced Informatics in Medicine (AIM) GALEN project 2012. References [1]

[2]

[3] [4] [5] [6] [7] [8] [9]

[10] [11] [12] [13]

[14]

[15]

[16]

[17]

[18]

Goble CA, Glowinski AJ, Nowlan WA, Rector AL (1992) “A Descriptive Semantic Formalism for Medicine” in: Proceedings of the Ninth International Conference on Data Engineering, IEEE Computer Society Press, pp. 624-632. Rector AL, Nowlan WA and Kay S (1992) “Conceptual Knowledge: The Core of Medical Information Systems” in: Lun KC, Degoulet P, Pierre TE, Rienhoff (eds) MEDINFO 92, Proceedings of the Seventh World Congress on Medical Informatics, Geneva, North-Holland pp.1420-1426 Brachman RJ, Fikes RE and Levesque HJ (1983) “KRYPTON: A functional approach to knowledge representation” in: IEEE Computer 16(10) pp. 73-76 Rector AL, Nowlan WA, Kay S, Goble CA, Howkins TJ (1993) “A Framework for Modelling the Electronic Medical Record” in: Methods of Information in Medicine 32(2) pp. 109-119 Hull R and King R (1987) “Semantic database modelling: survey, applications and research issues” in ACM Computing Surveys 19(3) pp. 210-260 Kay S: Towards relevant medical information systems, Med. Inform. Vol 15 No 4, pp-327-331 Chen P.P. “The Entity-Relationship Model - Towards a Unified View of Data”. (1979) In ACM Trans on Database Systems 1(1) pp.9-36. Nijssen GM and Halpin TA (1989) “Conceptual Schema and Relational Database Design”, PrenticeHall. Su SYW “An Object-Oriented Semantic Association Model (OSAM*) (1988) in AI in Industrial Engineering and Manufacturing: Theoretical Issues and Applications” Kumara, Kashuap and Soyster.(eds) American Insititute of Industrial Engineers Brachman RJ and Schmoize JG (1985) ‘‘An overview of the KL-ONE knowledge representation system’’ in: Cognitive Science 9, pp. 171-216 Borgida A, Brachman RJ, McGuinness DL and Resnick LA (1989) “CLASSIC: A structural data model for objects” in: SIGMOD Record 18(2), pp. 58-67. Nebel B, (1988) “Computational complexity of terminological reasoning in BACK” in: Artificial Intelligence, 34(3) pp. 371-383 MacGregor R, “The evolving technology of classification-based knowledge representation systems” in Sowa J (ed) Principles of Semantic Networks: Explorations in the Representation of Knowledge. Morgan Kaufman (1990) Goble CA, Glowinski AJ , Jeffery KG (1993). ‘‘Semantic Constraints in a Medical Information System’’ in: (Eds) Worboys M, Grundy F, Proceedings of BNCOD11, Lecture Notes in Computer Science 696 Advances in Databases, Springer-Verlag, pp. 40-57. Rector AL, Goble CA, Horan B, Howkins TJ, Kay S, Nowlan WA and Wilson A (1990).“Shedding Light on Patient’s Problems: Integrating Knowledge Based Systems into Medical Practice”, in: L Aiello (ed), Proceedings of the Ninth European Conference on Artificial Intelligence, ECAI 90, Pitman Publishing, pp 531-534 Beck HW, Gala SK and Navathe SB (1989) “Classification as a query processing technique in the CANDIDE data model” in: Proceedings Fifth International Conference on Data Engineering, pp. 572581. Goble CA and Crowther PJ (1994) “Schemas for Story Telling in the Medical Record” in (Jarke M, Bubenko J, Jeffery K eds) Proceedings of the Advances in Database Technology EDBT 94, March Cambridge, UK, Springer-Verlag, pp.393-406 Rector AL, Solomon WD, Nowlan WA and Rush TW (1994) “A Terminology Server for Medical Language and Medical Information Systems” in Proceedings of IMIA 94, Geneva, May 1994.

28


[19]

[20] [21] [22]

[23]

[24] [25] [26]

[27] [28] [29] [30] [31] [32] [33] [34]

Haw D, Goble CA and Rector AL (1994) “GUIDANCE: Making it Easy for the User to be an Expert” to appear in the Proceedings of the 2nd Intl Workshop on User Interfaces to Databases, IDS94, July 1994, Ambleside, UK. Solomon WD (1994) “The Master Notation, version 2”, AIM Deliverable id A2012/D14, University of Manchester, UK. Woods WA (1975) “What’s in a link: foundation for semantic networks”. Reproduced in Brachman RJ, Levesque HJ (eds) Readings in Knowledge Representation, Morgan-Kaufmann, CA, pp. 41-70, Rector AL, Nowlan WA, Glowinski AJ and Matthews GM (1993) “The GRAIL Kernel:GALEN Representation And Integration Language, version 1”, AIM Deliverable id A2012/D6, University of Manchester, UK. Brachman RJ, McGuinness DL, Patel-Schneider PF, Resnick LA, Borgida A, (1991) “Living with Classic: When and how to use a KL-ONE-like language” in Sowa J (ed) Principles of Semantic Networks: Explorations in the representation of knowledge, Morgan-Kaufmann, CA, pp. 401-456 Sowa J, “Conceptual Structures: Knowledge Representation in Mind and Machine”, John Wiley & Sons (1985). Rossi Mori A, Agnello P, Galeazzi E, Lorino F (1994) “The Consolidated CORE Model and Code Conversion Module version 2”, AIM Deliverable id A2012/D16, University of Manchester, UK Information Modelling and Knowledge Bases IV (eds Jaakkola H, Kangassalo H), Proceedings 4th European-Japanese Seminar on Information Modelling and Knowledge Bases, Kista Sweden June 1994, IOS Press, Amsterdam, 1995. Le Roux B (1995) “Steps towards a Unified Approach of Knowledge Modelling” in [26] Toppano E, Chittaro L, Tasso C (1995), “Dimensions of Abstraction and Approximation in the Multimodeling Approach” in [26] Bijl A (1995) “Showing your mind”, in [26] Bubenko J, Kirikova M (1995) “‘Worlds’ in Requirements Acquisition and Modelling” in [26] Gustas R (1995) “Towards understanding and formal definition of conceptual constraints” in [26] Saltor F, Castellanos M, Garcia-Solaco M (1995) “Modelling specialization as BLOOM semi-lattices” in [26] Lenat DB, Gupha RV, Pittman K, Pratt D, Shepherd M (1990) “Cyc: toward programs with common sense”, Communications of the ACM 1990 33(8) pp. 30-49 Verrijn-Stuart AA, Ramackers GJ, (1995) “Embedded organisational knowledge: Requirements for and usage of conecptual information system models” in [26]

29

Conceptual, Semantic and Information Models for ... - Semantic Scholar

Conceptual, Semantic and Information Models for ... - Semantic Scholar

Suggest Documents

Managing Conceptual Models about Information ... - Semantic Scholar

Metaphors and Models: Conceptual Foundations ... - Semantic Scholar

Conceptual Framework and Models for Identifying ... - Semantic Scholar

Discriminative Models for Information Retrieval - Semantic Scholar

Theoretical Issues in Conceptual Information ... - Semantic Scholar

Numerical Analysis in Conceptual Information ... - Semantic Scholar

Information Retrieval with Conceptual Graph ... - Semantic Scholar

Timetable Information: Models and Algorithms - Semantic Scholar

From Conceptual Models to Simulation Models - Semantic Scholar

From Conceptual Models to Simulation Models ... - Semantic Scholar

Information Models and Conceptual Models - Tropos

Conceptual Models for Cross-cutting Aspects in ... - Semantic Scholar

Metrics for data warehouse conceptual models ... - Semantic Scholar

Conceptual language models for domain-specific ... - Semantic Scholar

Semantic and Conceptual Context-Aware ... - Semantic Scholar

The roles of conceptual device models and user ... - Semantic Scholar

Towards a Theory of Conceptual Models and ... - Semantic Scholar

Conceptual Patterns for Reuse in Information ... - Semantic Scholar

Information Models of Knowledge - Semantic Scholar

Organizational Models for Semantic Service ... - Semantic Scholar

Conceptual Metaphor Meets Conceptual Change - Semantic Scholar

Models for hypertext - Semantic Scholar

Semantic/Conceptual Annotation Techniques ... - Semantic Scholar

Conceptual models of waste management systems - Semantic Scholar