Object-oriented programming in COMMON LISP has many features useful for developing a biomedical knowledge base. Object-oriented programming is ...
SENEX, An Object-Oriented Biomedical Knowledge Base Sheldon Ball, Ph.D., M.D., Lawrence Wright, Perry Miller, Ph.D., M.D.
Medical Informatics Program and Department of Anesthesiology Yale University School of Medicine, New Haven, CT 06510
Abstraci:
We are exploring these issues of knowledge representation in a domain of Increasing medical significance. Life offers but two options, to grow old or to die young. Most people grow old. However, the quality of later life is diminished by the process of neurodegeneration, and loss of memory is the most common complaint (Thompson 1986). Indeed, the quest for immortality is among the oldest desires of our species (Walford 1983). The domain of neurodegeneration in aging is well suited to a large scale programming effort by multiple authors and is likely to remain a domain of medical significance for the foreseeable future. The domain encompasses a bewildering diversity of information well suited to testing strategies for knowledge representation and the use of object-oriented programming. The benefits of organizing that information may also have significant implications for gerontologic research. In this paper we describe an objectoriented program, SENEX, in which CLOS has been used to represent biomedical information in the domain of neurodegeneration in aging.
We are currently developing an object-oriented knowledge base (SENEX) in the domain of neurodegeneration and loss of memory in aging. Initially, we are focusing on three sets of issues in the representation of biomedical information. First, we are seeking to extend the Medical Subjects Headings (MeSH) nomenclature to include new classes of biomedical entities and to include relationships among those entities. Secondly, we are structuring biomedical information rather than categorizing text for bibliographic retrieval. Finally, we are exploring ways in which such information could be used in an interactive system created for purposes of education and for designing basic research experiments. This article describes the current behavior of SENEX, which is being developed using the COMMON LISP Object System (CLOS). It also discusses various issues raised, and plans for future development.
Classes
anu
inneniance oT i-roperies:.
The COMMON LISP Object System has been used to a heirarchical structure of object classes. Nomenclature for object classes adheres as closely as possible to MeSH terms (the Medical Subject Headings of the National Library of Medicine) and a mapping between MeSH terms and object classes is implemented. The class structure is to a large extent derived from the MeSH Tree Structure but differs in certain fundamental ways. Classes of objects inherit properties from their superclasses and conversely pass properties on to their subclasses.
create
There is an explosion of new information in the biomedical domain. One strategy for a research scientist to cope with this information explosion is to specialize his/her expertise so as to reduce the amount of directly relevant information published each month. However, to allow the exploration of problems that may draw relevant information from a wide range of disciplines, some other means of coping with the information explosion is essential. For this task, a vast amount of information must be represented in a form that allows specific retrieval as well as computational processing of that information. The processed information or knowledge must then be made available in a form which facilitates a conceptual understanding of the ideas represented in that information.
Consider, for example, the NMDA receptor, the activation of which leads to calcium currents in postsynaptic membranes of cortical neurons (Muller et al 1988). These calcium currents linked to the NMDA receptor initiate the phenonenon of long term potentiation, one of the biological substrates of long-term memory (Brown et al, 1988). The NMDA receptor is represented as follows:
Object-oriented programming in COMMON LISP has many features useful for developing a biomedical knowledge base. Object-oriented programming is particularly beneficial for large programs that are written by multiple authors (Moon 1989). An object-oriented program is usually designed and constructed in a modular fashion. The object-oriented style of programming makes it easier to organize large programs, and it helps decompose complex problems into functional modules. The COMMON LISP Object System (CLOS) comprises a set of tools for developing object-oriented programs in COMMON LISP (Keene 1989).
85
)195-4210/89/0000/0085$01.00
1989 SCAMC, Inc.
NMDA-RECEPTOR SUPERTYPES: (SYNAPTIC-RECEPTOR
IONOPHORE-LINKED-RECEPTOR) SUBTYPES: NIL PROPERTIES: (((ION-CHANNEL :NAME NMDA-ION-CHANNEL) (COFACTORS :COFACTOR GLYCINE) (COMPARTMENT: SYNAPTIC-MEMBRANE) (KD :LIGAND GLUTAMATE :VALUE HIGH) (KD :LIGAND ASPARTATE) (KD :LIGAND HCMOCYSTEATE) (KD :LIGAND QUINOLINATE) (KD :LIGAND GLYCINE :VALUE 1 :UNITS MM) (KI :LIGAND 2-AMINO-5-
Figure 1.
PHOSPHONOVALERATE) (KI :LIGAND 2-AMINO-5-PHOSPHONOHEPTANOATE)
med-object slots: instance-id part part-of class-name location ref
(KI :LIGAND PHENCYCLIDINE :TYPE NON-COMPETITIVE) (KI :LIGAND MK-801 :TYPE NON-COMPETITIVE) (KI :LIGAND KETAMINE
:TYPE NON-COMPETITIVE))) STATE: "no value" OCCUPANT: "no value" SUBSIDIARY-OCCUPANTS: "no value" QUALITIES: "no value"
chemical-substance slots: MW site
The class structure represents "is-a" relationships among object classes. Slots represent intrinsic properties, or attributes, of the classes. Thus properties or slot values might be assigned when the class is defined or at the time of object instantiation (when a specific instance of a class Is created) as follows:
protein slots: gene compartment
(instantiate rnmda-receptor :qualities '(( location :structure hippocampus :cell ((neuron :morphology pyramidal-cell :transmitter glutamate)))))
receptor slots: ligand cell kd agonist antagonist
membrane-protein
This code creates an instance of NMDA-receptor with specific properties. In addition, other properties are inherited from the NMDA-receptor class and its superclasses.
This instance of NMDA receptor is instantiated as a hippocampal pyramidal cell containing the neurotransmitter glutamate. Slots may appear In an instance that were not explicitly assigned at the time of defining the class or at the time of instantiation. These slots were inherited from
ionophore-linked-receptor synaptic-receptor slots: compartment slots: ion conductance :initform 'synaptic -membrane
superclasses Figure 1 shows the hierarchical structure from which NMDA-receptor inherits properties. Slots that the various classes contribute to NMDA-receptor are also shown. Classes may have multiple superclasses and in tum may have multiple subclasses. Notice that the class structure is itself informative. Thus an NMDA-receptor is a synaptic-receptor and an ionophore linked receptor .
NMDA-receptor
86
Static Relationships:
Static relationships ( relationships in some sense presumed to be static) are represented through object classes, and as instantiations of classes with particular values. For example, a particular relationship between Huntington's disease and the enzyme 3-hydroxyanthranilate oxygenase is represented as follows: (instantiate 'disease-related-change :direction 'increase :agent 'unspecified :context 'Huntington-chorea :object '((3-hydroxyanthranilate-oxygenase :qualities ((location :structure putamen :environment tissue) (genetic :species human)))) :english "3-Hydroxyanthranilate-oxygenase is increased in striatum and other brain regions of Huntington's disease patients." :ref 'ui88305371)
This relationship represents an assertion found in the literature. An english statement of the assertion is included, as well as the article in which the assertion is found (indicated by its unique identifier from MEDLINE). Assert-relation is a LISP macro which creates an relationship object of class disease-related-change. The slot values of subject, process, context etc. are assigned at the time of instantiation. Once assertions of this sort are in the database, they can be retrieved in a flexible fashion. Suppose, for example, one wished to know if there were an enzymes that were increased in the striatum of Huntington's disease patients. Such a query is formulated as follows:
(retrieve '((disease-related-change :direction increase :object ((enzyme :qualities ((location :structure striatum)))) :disease Huntington-chorea ))) The assertion described above is then retrieved as shown below. (If the user wanted further information, he/she could query MEDLINE using the unique identifier of the
reference.)
ENGLISH: 3-Hydroxyanthranilate-oxygenase is increased in the putamen of Huntington's disease patients. SPECIES: HUMAN ENVIRONMENT: TISSUE PROCESS: INCREASE SUBJECT: UNSPECIFIED DEGREE: (no value) CONTEXT: HUNTINGTON-CHOREA LOGICAL: T OBJECT:
3-HYDROXYANTHRANILATE-OXYGENASE: (:LOCATION PUTAMEN) DEGREE: (no value) CONFIDENCE: (no value) REF: UI88305371)
This relationship is retrieved since 1) 3hydroxyanthranilate oxygenase is an enzyme and 2) the putamen is part of the striatum. This retrieval is accomplished because lists of "is-a" and "part-of children are stored with each class of objects. Dynamic Relationships:
Dynamic relationships are currently represented as generic functions and methods. For example, the reaction of Ca+2/Calmodulin protein kinase Ill with peptide elongation factor 2 is represented as follows: (defmethod get-rxn-conditions ((enzyme CaM-kinase-3) (substrate peptide-elongation-factor-2)) (let ((conditions '((Mg+2 0.010 M) (ATP 0.005 M) (calmodulin 0.00000059 M) (Ca+2 0.000150 M) (ref UI88261574)))) (values
'CaM-kinase-3 '(peptide-elongation-factor-2)
'(peptide-elongation-factor-2-phosphate) conditions))) The arguments of defmethod indicate the class of objects to which the variable (e.g., enzyme, substrate) must belong for the method to be applicable. In this case, the enzyme must be of the class Ca+2-calmodulin-dependent-proteinkinase-3 and the substrate of the class peptide-elongationfactor-2. When called, the method returns the values 'Cacalmodulin-dependent-protein-kinase-3 (enzyme), '(peptideelongation-factor-2) (a list of substrates), '(peptideelongation-factor-2-phosphate) (a list of products) and conditions (a list of conditions).
((enzyme kinase) (substrate chemical-substance)) (delete-object 'ATP) (make-object 'ADP) (values))
For example, suppose one were interested in finding reactions of a protein kinase with a peptide elongation factor. A query is formulated as follows: ? (show-rxn protein-kinase
The generic function run-reaction calls the *:before" method 'run-reaction :before prior to calling the primary method "run-reaction". The ":before" method is applicable if the argument enzyme is a kinase and substrate Is a chemical substance, and is executed because there are no more specific ":beforem methods applicable. Thus, this *:before* method encodes the knowledge that kinases consume ATP, and this property Is inherited by all kinases.
'(peptide-elongation-factor)) PEPTIDE-ELONGATION-FACTOR-2 --CAM-KINASE-3-->
PEPTIDE-ELONGATION-FACTOR-2-PHOSPHATE Reaction Conditions (MG+2 0.01 M) (ATP 0.005 M) (CALMODULIN 0.00000059 M) (CA+2 0.00015 M) (REF UI88261574)
Current Status:
SENEX is currently implemented in prototype form with a limited set of knowledge. The present system contains knowledge about 410 different object classes, of which 340 In addition, information correspond to MeSH terms. represented in the form of object class slot values, methods, and assert-relation macros from approximately 100 papers and three textbooks are included.
NIL
Show-rxn is a LISP macro which takes as arguments an enzyme and a list of substrates. The execution of this macro results in a call to the generic function get-rxn-conditions. The reaction of interest is retrieved because Ca+2 /calmodulin dependent protein kinase IlIl is a protein kinase and peptide elongation factor 2 is a peptide elongation factor. Retrieval is again accomplished using is-a' and 'part-of' relationships. Non-enzmatic reactions are retrieved by supplying nil as the first argument to show-rxn.
We plan to assess the capabilities of the present system, making only modest additions to the knowledge base until we are comfortable that the system offers sufficient representational power. In addition, we plan to define methods for retrieving and processing protein structure, nucleic acid sequence, and gene mapping data. Once this phase is complete, we plan to significantly increase the amount of knowledge contained, so that it can be tested first as a knowledge retrieval tool and secondly as a tool for designing basic research experiments.
Simulation:
In addition to retrieval, a reaction itself might be simulated. We are just starting to explore the implications of including this capability in SENEX. We are including a primitive version of this capability now because we want to be sure that the structure being used for retrieval of information, may ultimately be useful to simulate various biologic processes. In the current, very prelimary version of this capability, the LISP macro rxn simulates the reaction by deleting substate(s) and making product(s) .
Summary: A person can only hold so much information in their head at one time. What is not in mental storage must be searched for: through the library, in textbooks, via online abstracts, or through discussions with colleagues. This endeavor frequently retrieves too much and/or too little Information and may take days or weeks to accomplish. The focused train of thought is disrupted. With the explosion of publications in the biomedical literature, it is likely that more and more relevant information will be accessible by search rather than from one's memory. Thus the utility of organizing information is quite clear.
A listing of enzyme, substrates, and products (other factors which must also exist as specified in conditions are not shown) prior to and after calling rxn, indicates a substrate (peptide elongation factor 2) was deleted and a product (peptide elongation factor 2 phosphate) created, but no change In enzyme (Ca+2/calmodulin dependent protein kinase l1l) occurs. In addition, the reaction consumes an ATP and generates ADP. This is handled by the generic function run-reaction and a ':before' method as follows:
In addition to retrieval of information, it would be very useful to simulate various biologic processes. Such simulations may reveal properties emergent from the complex interactions of many processes. Simulation might prove particularly useful
(defmethod run-reaction :before
88
Walford, R.L. Maximum Life Spn W.W. Norton, New York, NY, 1983
for study of systems such as synaptic remodelling that, because of their small size, are inaccessible to direct study by electrophysiologic and biochemical methods (Gamble and Koch, 1987). Ultimately, it may prove useful to simulate experiments in order to assess their likelihood of success.
Object-oriented programming is useful in attempting to represent more fully the complex and diverse field of biomedical knowledge. Operations concerning the structure and interactions of biomedical entities, as well as retrieval of (possibly remote, electronically accessed) references to them, can be defined in a uniform way at the top level of the system, and then extended and redefined as needed within specialized modules dealing with particular areas of knowledge. The benefits of data abstraction and modular development are particularly important where a large system needs to be developed, maintained and extended by many people, possibly working at different sites, as would likely be needed for a full fledged version of SENEX. Other features, such as CLOS's flexible scheme for inheritance of properties and behavior within the class hierarchies, may also prove useful.
Acknowlegments: This research was supported in part by NIH Grants T15 LM07056 and by NIH Contract NOI LM63524 from the National Library of Medicine.
2eferences: Brown T.H.. Chapman P.F., Kairiss E.W. and Keenan C.L. Longterm synaptic potentiation, Science 242:724-728, 1988 Cote L. Aging of the brain and dementia in: Principles of Neurobiology Kandel E.R. and Schwartz J.H. (eds.) Raven Press, New York, NY, pg 784, 1985
Gamble E. and Koch C. The dynamics of free calcium in dendritic spines in response to repetitive synaptic input, Science 236:1311-1315, 1987 Kenne S.E. Object Oriented Programming in COMMONLISP Addison Wesley, Reading MA, 1989 Moon D.A. Foward to Kenne S.E. Object Oriented Programming in COMMONLISP Addison Wesley, Reading MA, 1989 Muller D., Joly M. and Lynch G. Contributions of quisqualate and NMDA receptors to the induction and expression of LTP, Science 242:1694-1698, 1988
Thompson R.F. The neurobiology of learning and memory, Science 233:941-947, 1986
89