The University of Adelaide. Adelaide 5005, Australia. ABSTRACT ... Figure 1: Example of a Hasse Diagram and a Folded Hasse Diagram a b c e f h g a b c e.
Text Retrival for Medical Discharge Summaries using SNOMED and Formal Concept Analysis R.J. Cole and P.W Eklund
Department of Computer Science The University of Adelaide Adelaide 5005, Australia ABSTRACT Our trial uses a set of 9,000 patient medical discharge summaries from the Thorasic unit at the Royal Adelaide Hospital. The discharge summaries are indexed using SNOMED, Systematized Nomenclature of Medicine. The documents are semi-structured free text documents with their structure described using SGML, Standardized Generalized Markup Language. The medical discharge summaries are used as a training set and the resulting synthesized concept lattice compared with SNOMED. The approach described is novel. An indexing agent builds an index from a clinical conceptual hierarchy into a document set. The user submits a query as the result of a walk through the concept space, displayed to the user in the form of a lattice. The lattice structure is constructed using formal concept analysis and concepts from the clinical hierarchy are considered attributes. Sentences within the documents are treated as objects.
1 Background A common approach to text retrieval involves phrase indexing in documents. There are two approaches to phrase indexing in the text retrieval literature: (i) deriving phrases and building indices from into collections of text documents. These systems generally attempt to generate phrases from the text collection by statistical techniques; (ii) use a thesaurus to derive associations between terms. These thesauri usually have some hierarchical structure as well as a number of other relations that form a loose conglomeration of terms. The later systems (ii) have met with wider success than (i) because construction of the thesauri are performed by human experts with large amounts of domain speci c knowledge. However, they are limited because the structure of the thesaurus is dependent on domain knowledge and not related to the structure of concepts contained within the document. On the other hand, this research presents a mechanism for incorporating information from two sources: (i) a hierarchically structured thesaurus and (ii) the inherent structure of the concepts within a corpus of text documents. These two pieces of information are merged into a single structure called a formal concept lattice [9].
1.1 The Documents
An index into a corpus of 9,000 patient discharge summaries was built by incorporating SNOMED (Systematized Nomenclature of Medicine), a thesaurus of medical terms, into a formal concept lattice representing the combinations in which concepts appear in the documents. SGML[6] is used to represent the structure of discharge summaries. It is an international standard that has received wide spread use in the text retrieval community in the task of describing document structure[8]. The discharge summaries were obtained from the Thoracis Unit at the Royal Adelaide Hospital (RAH) as a printer dump. References to names of the patient in the discharge summaries were replaced with a generic marker. Document instances were passed through an SGML parser together with the document type declaration and the output of the parser was a folded representation of the structure and data of each document. This was stored in a le for processing. Several medical language thesauri exist. SNOMED[2], UMLS[5] (Uni ed Medical Language System) and the RCC System[1] (Read Clinical Coding System) all have a subsumption relation corresponding to a specialization/generalization hierarchy. All candidates de ne phrases describing each clinical concept. 1
a
b
c
e
f
a
d
b
g
c
e
f
h
h
(a)
(b)
Figure 1: Example of a Hasse Diagram and a Folded Hasse Diagram a
a Unfold Element c
b
e
f
e
h
c
b
f
g
h
Figure 2: Example of an UnFold Operation Each phrase is composed of a number of terms or lexical elements. The clinical thesaurus chosen for use in this research was SNOMED. Concepts in SNOMED are described by a set of phrases. Each phrase consists of either a single term or a set of terms. A concept is assumed to be mentioned within a document if all terms in a phrase for that concept appear within a sentence in that document. A partially ordered set is a set of elements that has a subsumption relation de ned on it. Partially ordered sets are commonly called posets. Both the SNOMED hierarchy of concepts and the formal concept lattice (described in Section 3) have poset structures. In order to display these structures and manipulate them, a viewer for walking partially ordered sets was produced. Partially ordered sets can be drawn as Hasse Diagrams[3]. SNOMED contains 100,000 concepts and so it is impossible for a user to comprehend the Hasse diagram of all SNOMED concepts. Instead, a user should be presented with a portion of the hierarchy and should be able to move around in the hierarchy changing the portion displayed. Two classi cations of elements are introduced into the Hasse diagram, folded and unfolded elements. Folded elements have only a portion of their children displayed and unfolded elements have all of their children displayed. An example of a Hasse diagram corresponding to a poset in Figure 1(a) and a representation with folded and unfolded elements of the same poset is shown in Figure 1(b). The folded elements in Figure 1(b) are represented by a square while the unfolded elements are represented by a circle. Figure 2 shows the action of unfolding an element. The poset being represented is the same as that shown in Figure 1(a). Element c is unfolded and g is subsequently added to the folded representation. Figure 3 shows the action of folding an element, and again the poset is the same as that shown in Figure 1(a). Element c is folded and so removed from the diagram. As a result element g is also folded since it does not have another parent displayed in the folded Hasse diagram. However, e does have another parent d, displayed in the folded Hasse diagram and so is not folded. Likewise, element h has
a
a Fold Element c
b
e
f
b
g
e
h
f
h
Figure 3: Example of a Fold Operation
Figure 4: Poset Viewer : Query Results indicates document numbers containing query terms. other parents displayed in the Hasse diagram and so is not folded as a result of g being folded.
1.2 Poset Viewer Implementation
Figure 4 shows this poset viewer displaying a portion of the SNOMED hierarchy. At the top of the window is a set of choices. These determine the active tool: Unfold, Fold, Where, Move and Query. The Unfold and Fold tools are used to manipulate the folded Hasse diagram. The \Where" tool gives the location of the mouse when clicked. The \Move" tool is used to move elements displayed in the folded Hasse diagram. The \Query" tool obtains a list of all documents containing a SNOMED concept displayed in the Hasse diagram. To the left of the window is a list box containing results of the last query by document number. Clicking on an entry in the list box causes the corresponding document to be displayed. Documents are displayed in another window in a form that represents their structure. The documents are composed of structural elements which may contain other structural elements and paragraphs of text. Each structural element has a title which is always displayed. A structural element may be either unfolded or folded. If the element is folded then only the title of the structural element is displayed. If the element is unfolded then all of the structural elements and paragraphs of text inside the structural element are displayed. Figure 5 is an example of a patient discharge summary. Here some of the elements are unfolded and some are folded. The structural element \Patient Discharge Summary" is unfolded and the structural element \Patient File Information" is folded. As a result only the title for this structural element is displayed. Figure 6 is the poset viewer displaying a folded Hasse diagram representation of a formal concept lattice. Formal concept analysis is described below in Section 3. The tool chooser at the top of the
Figure 5: Structured Document Viewer Displaying a Patient Discharge Summary.
Figure 6: Poset Viewer Displaying a Formal Concept Lattice.
Digital Seach Tree of Terms
Concept Structure Concept Structure
Linked List of Indices into Digital Search Tree Concept Number
Figure 7: Simple Data Structure used for Indexing. window shown in Figure 6 has the options \Unfold", \Fold", \Where", \Move", and \Attr". \Unfold", \Fold", \Where" and \Move" have the same behavior as described above. The \Attr" command is speci c to poset viewer used to display formal concept lattices. This tool displays a list of attributes associated with concepts. There are two types of concepts displayed: base and abstract. Base concepts refer to a particular document. Abstract concepts refer to a number of documents.
2 Construction of Document Index The ASCII SNOMED le was parsed and the data structure represented in Figure 7 created. Each record in the le is represented by a concept structure. A digital search tree maps terms from phrases describing SNOMED concepts to sets of concept structures. Concept structures store a linked list of indices into the digital search tree corresponding to terms in the associated phrase. The folded representation produced by SGMLS is parsed and stored in a tree like data structure. The text of each paragraph is sent to a lexical analyzer separating paragraphs into sentences and passes the result to an indexing agent. The indexing agent extracts concepts from each sentence. A concept is considered to be present in a sentence if all of the terms in one of the phrases describing that concept are present in the sentence. The indexing agent uses the algorithm shown in Figure 2. The indexing agent returns a set of concepts found in each sentence. The union of all concept sets for all sentences is taken and the result stored in a le. This le stores a mapping from documents to sets of concepts found in that document.
2.1 Subsumption Indexing
Concepts contained within SNOMED have a subsumption relation de ned upon them that produces a specialization/generalization hierarchy. When traversing this hierarchy evidence that a concept is present in a discharge summary is provided by any specialization of that concept also being present in the discharge summary. To support this form of indexing, we build on the simple indexing mechanism already implemented, and extend it by including with each document the up-set of the simply extracted concepts | an up-set of a set A is the union of all ancestor sets of elements from A. This works because the depth of SNOMED is generally less than 10. The maximum increase in the number of concepts per document is by a factor of the maximum depth. This worst case is generally not reached because the average depth of SNOMED is much less than 10, and many concepts will have common ancestors. SNOMED concepts were placed in a digital search tree using their codes as the character string index. Figure 9 shows an example of a number of SNOMED concepts. Figure 9 shows these concepts placed in a digital search tree. The code of a leaf node may be determined by traversing to the base of the tree through parent nodes. The set of parent nodes may also be determined by traversing to the top of the tree and following paths from each parent containing all zeroes. These paths lead precisely to the parent
Inputs:
let T be the set of all terms let S = fti g be the set of terms in the sentence let C = fci g be the set of concepts let Pi = fpij g be the set of phrases which describe concept ci let a phrase p be a set of terms which compose that phrase
Outputs:
let D be the set of concepts contained in the sentence
for ti 2 S loop for p 2 fpij jti 2 pij g loop if S \ p = p then add ci to D end if end loop end loop Figure 8: Algorithm for Concept Extraction from Free Text Documents. T-10000 Integumentary System
T-11000 Musculoskeletal System
T-11100 Bone
T-11200 Shoulder Girdle
Figure 9: Example Tree of SNOMED Concepts T
1
0
1
0
0
1
2
0
0
0
0
0
0
0
0
T-11200 Shoulder Girdle T-11000 Musculoskeletal System
T-11100 Bone
T-10000 Integumentary System
Figure 10: Data-Structure used for Subsumption Indexing
nodes of a given concept.
3 Formal Concept Analysis Formal concept analysis was rst proposed by Wille[9]. It is a mechanism for formulating a model of the world in terms of objects and attributes. It is assumed that a relation is provided that connects objects to the attributes they possess. Formal concept analysis introduces an entity called a concept. A concept is a set of attributes and objects. Attributes are maximally possessed by the set of objects and similarly the objects are the maximal set which all possess the set of attributes. Concepts are placed in a lattice structure in which the meet and join of any combination of elements are given by de nition. This concept lattice not only contains concepts corresponding to each object but also concepts corresponding to the meet and join of other concepts. The lattice can express all relationships between attributes. For example, the lattice can represent the relationship that if an object has one attribute then it must have another speci ed attribute. It is the capability to express such relationships that makes the lattice a powerful algebraic structure. The basic structure of formal concept analysis is the context. A context is comprised of a set of objects, a set of attributes and a relation describing which objects possess which attributes. In the formal de nition, the set of objects is denoted by G (for the German word Gegenstande), and the set of attributes is denoted by M (for the German Merkmale meaning attributes).
De nition 1 A context is a triple < G; M; I > where G and M are sets of elements corresponding to objects and attributes respectively and I is a binary relation de ned on G M . gIm is used to denote that (g; m) 2 I and means that object g has attribute m. From the context a set of concepts is derived. Each concept is a pair (A; B) where A is a set of objects and B is a set of attributes. Both A and B are maximal sets that conform to each other.
De nition 2 Let < G; M; I > be a context. Then the extent of a set of objects A is denoted A and is 0
given in the following formula. The intent of a set of attributes B is denoted B and is also given below. 0
A = m 2 M j(8g 2 A)gIm 0
B = g 2 Gj(8m 2 B)gIm 0
De nition 3 Let < G; M; I > be a context. A concept is a pair (A; B) where A 2 G, B 2 M , A = B
0
and B = A . 0
In other words a concept (A; B) is a pair where A 2 G, B 2 M, A is the intent of B and B is the extent of A. It is possible to de ne a subsumption relation on the set of concepts.
De nition 4 Let < G; M; I > be a context. The subsumption relation is de ned on the set of concepts formed from the context such that (A1 ; B1 ) (A2 ; B2 ) i A1 A2 . These de nitions now lead us to the Fundamental Theorem on Concept Lattices which says that the set of concepts and the subsumption relation de ned above forms a complete lattice.
Theorem 1 (Fundamental Theorm on Concept Lattices) Let < G; M; I > be a context. Then if B is the set of all concepts formed from the context, and is the subsumption relation on concepts de ned above then < B; > is a complete lattice in which the join and meet are given by, _ [ \ j 2J
^
j 2J
(Aj ; Bj ) = (( (Aj ; Bj ) = (
Aj ) ; 00
j 2J
\
j 2J
Aj ; (
j 2J
[
j 2J
Bj )
Bj ) ) 00
This theorem[9] says that the set of concepts produced by a context is a lattice and the meet and joint of a group of concepts can be calculated by the above formula involving the intersection of attributes and objects.
Context Objects
Object Set
Attributes
Attribute Set for an Object
Attribute Set
Figure 11: Context Data Structure.
3.1 Data Structure for Formal Concept Analysis
Formal concept analysis views the world as consisting of objects and attributes. The documents were considered as objects and the SNOMED concepts attributes. A document possesses the attribute of a particular SNOMED concept if that SNOMED concept could be extracted from the SNOMED hierarchy. There are two main components to formal concept analysis: the rst is the context and the second, the lattice structure. The context is composed of a set of objects, a set of attributes and a relation which says which objects have which attributes. A pictorial representation of the data structure used to store the context is shown in Figure 11. The object-attribute relation is not stored explicitly in the context data structure but rather as a mapping from objects to sets of attributes. This is done because this relation is only ever used to generate this mapping. The context stores objects and attributes. Storing these elements in an unordered list means that lookup of these objects is very inecient and so they are given a full order de ned on them. The full order de ned on the attributes is arbitrary since they are completely unordered, so their memory locations are used to de ne the order. The objects are more structured elements because they have sets of attributes associated with them. The ordering on the objects was de ned in terms of the attributes they possess. The ordering is equivalent to considering the attributes of an object as a binary number where a bit is a one if the object possesses that attribute and zero if the object does not possess that attribute. The correspondence between bits and attributes is given by the ordering of the attributes. From the context the formal concept lattice is produced. The subsumption relation for concepts can be de ned in terms of the extent of a concept. As a result the data structure for a concept stores only the extent. This is really necessary because the intent of concepts near the top of the lattice are large portions of the object set. Representing a concept as a set of attributes does not suer from the same problem because the attributes associated with a concept (the extent of the concept) must be a subset of the concepts associates with an object, and each document only contains a very small subset of the total number of medical concepts in the domain. Concepts are stored in a Hasse diagram data structure, and are generated incrementally by adding objects one at a time to the context and determining all new concepts generated by the addition of the new object. The Hasse diagram data structure used to structure the concepts was taken from an existing C++ container class library. A mapping from each attribute to the maximal element containing that attribute is stored. When a new object is added to the context the down-set of each maximal concept corresponding to a least
Inputs:
Let < G; M; I > be a context. Let o be the new object which is being added to the context Let B be the lattice of concepts generated from < G; M; I >
Outputs:
new set the set of concept which must be added to the lattice.
searched := fg new set := fg for a 2 extent(o) loop y := maximal element(a) current := fyg for x 2 current loop if x 62 searched then searched:insert(x) new concept := join(o; x) if (new concept 2 B) or (new concept 2 new set) then
NULL else new set:insert(new concept) end if else current:delete(x) end if current := next cochain(current)
end loop end loop
Figure 12: Insertion Algorithm for Formal Concept Lattice.
attribute of the object is tested to see if it contains a generator concept. If the concept is a generator concept then a new concept is generated and added to a list of new concepts. If the new concept is already present in the lattice or the list of new concepts then the generated concept is discarded. Once all new concepts have been generated they are inserted into the lattice. The insertion algorithm is shown in Figure 3.1. Concepts when stored in sets are ordered in the same way as objects are ordered in the context. The means that searching for concepts in both the visited set and the new set has complexity O(log n). All ordered sets in this project were implemented as Red-Black Trees which are a binary implementation of 3-4 Balanced Trees. The Hasse diagram data structure is implemented as a set of nodes each of which has an unordered set of child nodes, and an unordered set of parent nodes. The top set and the bottom-set are stored in this data structure. The implementation of this data structure and the operations de ned on it are described by Ellis[4].
3.2 Embedding SNOMED in a Formal Concept Lattice
Before documents are added to the context all generalizations of the medical concepts extracted from the document are added as attributes to the document. Doing this imposes the generalization hierarchy contained in SNOMED on the attributes. i.e., if x is a generalization of the medical concept y then if document d contains concept y then d will also be assumed to contain x since x is just a generalization of concept y found in the document. The eect on the lattice is that the maximal lattice concept which has y in its extent will also have x in its extent. Adding the generalization of a SNOMED concept to the attributes of a document is part of the the way our knowledge representation views the world rather than being part of how it reasons about the world. From the lattice, certain observations can be made about relationships between attributes. Imposing the SNOMED hierarchy as part of the perception mechanism means that the relation enforced on the attributes can be inferred from the lattice. However, the relation is supplemented by the relations imposed by the combinations in which attributes appear in the document. Since we know that the SNOMED hierarchy is embedded in the lattice it can be used to index the lattice structure. For example, if we want to nd all lattice concepts that contain the conjunction of two SNOMED concepts for which one is the generalization of the other, then we can simply take the down set of the maximal lattice concept that contains the more speci c of the two SNOMED concepts. This is developed more formally in Theorem 2 below.
Theorem 2 Let B be the formal concept lattice derived from a context < G; M; I >, x1; x2 2 M , and (A; B) is the maximal element in B such that x1 2 B . Then (A; B) is unique and 8g 2 G gIx1 implies gIx2 if and only if x1 2 B . Proof Assume that there exists (A1 ; B1) 2 B such than (A1 ; B1 ) 6 (A; B) and x2 2 B1 . The join of (A1; B1 ) and (A; B) equals ((A1 [ A) ; B1 \ B) (A; B) is absurd since (A; B) is maximal. Therefore (A1; B1 ) does not exist. ): 8g 2 G gIx1 ! gIx2 . Now B can be constructed as B = Sj J fgj g and we know that x1 2 B . Hence, 8j 2 J , x1 2 fgj g and 8j 2 J x2 2 gj since gIx1 ! gIx2 and so x2 2 B . 2 (: x1 2 B and x2 2 B . Now 8(A1 ; B1 ) x1 2 B1 implies x2 2 B since (A1 ; B1) (A; B) since (A; B) is maximal. Now gIx1 implies (fgg ; fgg ) is a concept and (fgg ; fgg ) (A; B) since x1 2 fgg and so x2 2 fgg implies gIx2 . 2 Corollary 1 Let B be the formal concept lattice derived from a context < G; M; I >, x1; x2 2 M such that for all g 2 G x1 2 fgg implies x2 2 fgg . Then for all (A; B) 2 B x1 2 B implies x2 2 B . 00
0
0
2
0
00
0
00
0
0
0
0
0
Theorem 2 gives a mechanism for nding all documents containing all medical concepts in a set of medical concepts. Corollary 1 says that the SNOMED hierarchy is embedded in the formal concept lattice, i.e. we get the attribute implication given to us by SNOMED. For example, if one SNOMED concept is a generalization of another then the presence of the specialized SNOMED concept in the extent of lattice concept implies the existence of the generalized SNOMED concept in the extent of that lattice concept.
Integumentary System
Musculosketal System
Bone
Body Region
Shoulder Girdle
Axial Skeleton
Rib
Hematopoitic
Cell
Clavicle
Tenth Rib
Bony Tissue
Figure 13: Context Data Structure
3.3 Formulating User Queries
In order to use the system a user must be able to formulate a query that represents their information need. In this system, the user formulates the query by walking through the SNOMED hierarchy specifying the level of specialization for each concept she is interested in. For example, if we had the SNOMED hierarchy shown in Figure 13 and the dark shaded region indicates concepts chosen by the user, then the union of the dark and light shaded regions indicates the concepts present in each document returned by the system. By submitting this query the user is saying that she is interested in documents containing the concepts \Bony Tissue", \Rib" and \Body" Region. Although this is a contrived example, the user could be requesting documents containing a set of SNOMED concepts because they are related in some way. When they appear together in the patient discharge summary this may suggest some other piece of information about the patient. The assumption is that documents are being retrieved by a domain expert not because they conform to some arbitrary SQL query that the medical expert has interpreted from a relational schemata but rather because a query explicates some part of the domain theory that the expert expresses an interest in. When the user extends the total shaded region in Figure 13 by selecting more concepts the user is increasing the specialization of the request by requiring more concepts, and more specialized concepts, to be present in the document set. The SNOMED concepts on the periphery of the total shaded region are important because the user is requesting that these concepts, or specializations of these concepts, all exist in each document returned. Concepts not on the boundary, but inside the total shaded region, will be present in each document returned as a consequence of concepts which exist on the boundary of the shaded region. This is because the concepts on the boundary are specializations of concepts in the interior. The user is able to formulate the query by using the folded Hasse diagram representation of the SNOMED hierarchy of concepts. The advantage of formulating the query in terms of the SNOMED hierarchy is that the domain expert is using the same framework for viewing the domain that is used by the indexing system to form a view of the document set.
3.4 Satisfying User Queries
The set of documents matching a query can be retrieved by nding the meet of the maximal elements of each SNOMED concept on the boundary of the region of selected concepts. The meet of these maximal lattice concepts has an extent that at least contains any SNOMED concepts in the shaded region and the intent of the meet lattice concept will contain the union of the documents sets containing each concept in the SNOMED hierarchy.
More formally. Let < G; M; I > be a context, and B the lattice of concepts derived from the context. Let be a tree structured partial W relation de ned on M. Let Q M be a set of attributes which represents a query. Let (A; B) = q Q where maximal(q) is the maximal lattice concept. Then Q B and g 2 A implies fgg 2 B. Since only the extent of each concept is stored in the data structure that represents the lattice, the intent of the maximal meet of a user query cannot be calculated directly and must be calculated by traversing the Hasse diagram. An algorithm for performing such a traversal is given in [4]. This traversal of the lattice structure might be replaced by an explicit calculation if a term encoding for the concept lattice were calculated. 0
2
4 Future Work Although signi cant advice was provided from various experts in Medical Infomatics, the question of how well the system can be used to support epidemiological research was only addressed in a general way. It therefore remains to be seen how acceptable the system will be to medical experts in the clinical domain. Many documents admitted to the system have a complex hierarchical structure composed of specialized sections containing information from a restricted portion of the domain. A dicult undertaking would be to try and incorporate this information about the document structures in which concepts were found into the knowledge representation framework. Using tri-state lattices[10] is one approach that might prove useful to indicate modality and context. A system for incorporating negative attributes in formal concept analysis has been proposed by Martin [7]. The approach used in this project could be extended to include these negative attributes. There is however no currently implemented mechanism for dealing with the negative concepts. However, such an approach is important to medical infomatics. Another area of further work is comparing the performance of the concept lattice approach and existing IR approaches. A diculty associated with such a comparison is that the system implemented in this research must be applied to a domain speci c corpus of text documents that has a structured hierarchy of concepts from that domain. We are unaware of any public collections of discharge summaries whose relevance to a set of queries has been decided by an a set of experts. If such a collection became available, further work would be a comparison of the performance of the technique proposed in this research and a variety of existing techniques in the text retrieval community.
5 Conclusion The research proposes a scheme for combining expert knowledge in the formal concept analysis knowledge representation framework. Furthermore, it has been shown that this scheme is viable by successfully applying it the task of information retrieval. While there have been attempts to apply formal concept analysis to text retrieval in the past the novel approach in this project was the dual realization that the domain of the text documents should be restricted to a portion of a rich domain theory and that expert knowledge can be incorporated in the framework and used to increase eciency and performance. The mechanism for incorporating domain knowledge has a strong intuitive basis. We have re-enforced these intuitions with a mathematical treatment of that mechanism. A system was implemented allows the user to characterize her information need by walking a hierarchy of concepts and retrieving documents corresponding to this need. The documents indexed by the system can have an arbitrarily complex structure represented by SGML. While the research was aimed at proof of concept, the successful construction of a fully functional system for information retrieval and medical infomatics is a measurable outcome.
Acknowledgment
Richard Cole was sponsored by the DSTO, ITD as part of a CEED agreement. We also gratefully acknowledge the expertise of Andrew Burrow and Drs. Don Walker & Ross Wilkinson who contributed to the research. Any remaining errors are our own. Thanks also to Dr. Andrew Thorton from the RAH for access to the excellent discharge data.
References
[1] Read Clinical Coding System (RCC). Leicester, 1995. [2] R. A. C^ote, editor. Systematized Nomenclature of Medicine. Skokie Illinois, 1982. [3] B. A Davey and H. A Priestly. Introduction to Lattices and Order. Press Syndicate of the University of Cambridge, 1990. [4] Gerard Ellis. Managing Complex Objects. PhD thesis, Department of Computer Science, University of Queensland, 1995. [5] B. L. Humphreys and D. A. B. Lindberg. Building the uni ed medical language system. In L. W. Kingsland, editor, Proceedings of the 13th SCAMC, pages 475{480, Washington D.C., 1989. [6] ISO. ISO 8879 Information Processing - Text and oce systems - Standard Generalized markup language (SGML). International Organisation for Standards, October 1986. [7] C Nowak and P Eklund. Issues in the eent use of a relation ontology for conceptual reasoning. International Symposium of Knowledge, Retrieval, Use and Storage for Eciency (KRUSE), pages 194{198, 1995. [8] Ross Wilkinson. Eective retrieval of structured documents. Proceedings of the Seventeenth Annual Internations ACM-SIGIR on Research and Development in Information Retrieval, pages 311{317, July 1994. [9] R Wille. Restructuring lattice theory: An approach based on hierarchies of concepts. Ordered Sets, pages 445{470, 1982. [10] R Wille and F Lehmann. A triadic approach to formal concept analysis. International Conference on Conceptual Structures, 1995.