RASTA: A Distributed Temporal Abstraction System to ... - CiteSeerX

5 downloads 2926 Views 51KB Size Report
medical centers, and by the demand for automated tools to perform real-time monitoring of public health databases [3]. Background. Our group has more than 15 ...
RASTA: A Distributed Temporal Abstraction System to facilitate Knowledge-Driven Monitoring of Clinical Databases Martin J. O’Connor, M.Sc., William E. Grosso, M.S., Samson W. Tu, M.S., Mark A. Musen, M.D., Ph.D. Stanford Medical Informatics, Stanford University School of Medicine, Stanford, CA 94305-5479, USA Abstract The time dimension is very important when reasoning with clinical data. Unfortunately, the task of temporal reasoning is inherently computationally expensive. As the problems tackled by clinical decision support systems become more varied, increased demands will be placed on the temporal reasoning component, which may lead to slow response times. This paper addresses this problem. It describes a temporal reasoning system called RASTA that uses a distributed algorithm that enables it to deal with large data sets. The algorithm also supports a variety of configuration options, enabling RASTA to deal with a range of application requirements. Keywords: decision support systems, decision support techniques, medical informatics, knowledge-based systems, temporal reasoning, temporal abstraction.

Introduction Almost all clinical data have a temporal dimension [1]. For example, clinical interventions occur at particular points in time or over periods of time; diseases have onsets and durations; laboratory tests are usually recorded at particular points in time. Automated systems working with clinical data must be able to reason with this type of information, a process called temporal reasoning. Examples of tasks that depend heavily on temporal reasoning include patient monitoring, management of patients on clinical guidelines, and visualization of longitudinal patient data. A crucial part of temporal reasoning is creating high level temporally extended concepts from raw time-stamped data [2]. This task is often called temporal abstraction. When clinical systems are asked to move from reasoning with small to large amounts of clinical data, they require a scalable temporal abstraction architecture. This requirement is particularly vital when real-time response rates are required. These types of systems may become increasingly common, driven, for example, by large-scale deployments of decision support systems in hospitals and large medical centers, and by the demand for automated tools to perform real-time monitoring of public health databases [3].

Background Our group has more than 15 years’ experience in building clinical decision support systems, primarily for managing patients on clinical protocols. A major result of this work has been the EON environment, a component-based architecture for building automated clinical decision support systems [4]. A robust means for reasoning with temporal information is essential to EON; consequently, we have built a number of temporal reasoning tools to support this task. They include Chronus II [5,6], a SQL-like temporal query language, Tzolkin [7], a temporal database mediator, and RÉSUMÉ [8], a knowledge-based system for performing temporal reasoning. RÉSUMÉ implements Shahar's knowledge-based temporal-abstraction problem-solving method [8]. RÉSUMÉ generates interval-based time-related abstractions from time oriented clinical data and operates on data values that are time stamped with points or intervals. An important feature of RÉSUMÉ is that it is knowledge-driven. Temporal concepts are described in a knowledge base in the context of a high-level description of a medical domain. The knowledge base requires explicit representation of the knowledge required for performing abstractions on time-oriented clinical data. For example, abstractions are defined in terms of the primitive data that they depend on and their relationship to other abstractions in an abstraction hierarchy. This declarative approach provides great flexibility for describing the temporal concepts that a reasoning system encounters. However, RÉSUMÉ does not scale to the significantly higher data processing requirements for working with large amounts of data. It was implemented as a standalone rule-based system and does not offer realtime response rates for anything other than small single-patient data sets. RÉSUMÉ’s fundamental problem is that there is an exponential relationship between the size of the data set it operates on and its memory and CPU requirements. In addition, it does not allow the abstraction task to be distributed. In response to this problem, we developed a system called RASTA (RASTA: A System for Temporal Abstraction), which enables our temporal reasoning system scale to address larger problem sizes. RASTA incorporates many ideas and concepts used by

RÉSUMÉ, and acts as the basis of a scalable architecture for performing temporal reasoning with clinical data. RASTA uses a distributed algorithm that allows independent evaluation of each abstraction in an abstraction hierarchy. As a result, it can use separate processes for each portion of an abstraction tree for each patient. This feature allows RASTA to work on very large data sets. The algorithm also supports a variety of configuration options, enabling it to deal with numerous application requirements. It can, for example, be deployed as a single standalone process, or can be distributed among multiple processes on one or more machines.

used to limit the scope of inference and to direct the temporal abstraction algorithm to solve the problem at hand. Thus, an initial set of contexts must be provided to direct the abstraction process. 4. Case Identifiers A set of case identifiers is the final input to RASTA’s temporal abstraction system. These identifiers indicate the cases in the database for which abstractions are to be performed. Basic Temporal Abstraction RASTA’s temporal abstraction algorithm has four basic subtasks. These tasks are a subset of the tasks outlined in Shahar's knowledge-based temporalabstraction problem-solving method [8].

Distributed Abstraction Algorithm Before describing the algorithm in detail, we first outline the data needed by the algorithm, followed by a basic description of the temporal abstraction task. Information Requirements RASTA’s temporal abstraction algorithm uses four main data sources: 1. Domain Knowledge Base RASTA’s temporal abstraction algorithm is knowledge-driven. This knowledge is described in an abstraction knowledge base, which is a detailed description of all of the temporal abstractions that RASTA can perform in a particular domain with a particular data set. The knowledge base contains one or more abstraction hierarchies and a detailed specification for each abstraction. 2. Time Stamped Data Ultimately, each abstraction hierarchy depends on source – or primitive – data. Our system describes the source of these data in a site-dependant mapping knowledge base. A basic assumption of RASTA is that all data are in a relational database; consequently, the mapping knowledge base must specify a database, table and column name to identify the location of each primitive data component. It also contains type information for each data element. Each piece of primitive data used by RASTA must be time stamped. For example, a laboratory table must contain time stamps that indicate when each laboratory test was performed; a prescription table records the initial prescription date, and, possibly, the time at which the prescription is no longer valid. 3. Contexts In RASTA, all abstractions must be associated with a particular context. A context is a proposition that, intuitively, represents a state of affairs. For example, an abstraction may only be relevant during the administration of a certain type of drug. A context is

1. Context Restriction The initial contexts provided to RASTA define the starting state for the temporal abstraction process. However, RASTA may also generate new contexts during the deduction process. It can deduce these new contexts by combining domain-specific events, abstractions, and existing contexts. 2. Vertical Temporal Inference This subtask abstracts parameters and their values with contemporaneous time stamps into a value of a new, abstract parameter. An example would be generating a hemoglobin state abstraction of low from a raw laboratory value of 7.9 g/dl. 3. Horizontal Temporal Inference This task involves abstracting from parameters with time intervals that cover different, but meeting or overlapping, time periods. For example, two weeklong intervals of a low hemoglobin parameter separated by an third week-long interval may be abstracted into one three-week low interval of hemoglobin. RASTA can also infer certain domainspecific logical conclusions from certain parameters. For example, it can assert that a parameter that is true over a large interval remains true over all subintervals: if a patient had AIDS through 1999, RASTA infers that he had AIDS through March 1999. 4. Temporal Interpolation When appropriate, RASTA can bridge discontinuous temporal points or intervals. For example, if a patient’s hemoglobin is measured daily, and readings in consecutive days are low, RASTA can infer that the patient had a continuous period of low hemoglobin. Distributed Temporal Abstraction Horizontal temporal inferences and temporal interpolations can be extraordinarily expensive computationally. As the number of raw time stamped data points increases, the generating all possible temporal abstractions from them can become very expensive. Although response time may be acceptable

when performing temporal abstractions for a single patient, attempts to do so on multiple patients can quickly lead to unacceptable computation times. We tackled this problem by developing a temporal abstraction algorithm that is parallelizeable and distributable. This algorithm follows: Step 1: Initialization. RASTA reads a domain knowledge base that describes a set of all possible temporal abstraction types (Figure 1). RASTA builds internal data structures that mirror this abstraction knowledge and stores the state of these data structures in a relational database. In addition, it reads a mapping knowledge base, which identifies the location of the primitive data specified in the knowledge base, and also saves this information to a database. Thus, later invocations of RASTA can rapidly re-read this saved state without going through the potentially time consuming task of processing the domain and mapping knowledge bases. Finally, RASTA reads the initial contexts and case identifiers. Step 2: Task Distribution. The domain knowledge base identifies all possible abstractions that RASTA can perform. RASTA takes each possible abstraction and generates a standalone process that can generate an abstraction for a single case. This process is responsible for performing contemporaneous abstraction, temporal inference and temporal interpolation tasks for that abstraction and case. Its input is a set of active contexts and a single case identifier. Depending on the distribution scheme, each abstraction process is allocated to a particular machine or a particular CPU on a machine. Allocation to a particular thread within a containing process is also possible. Step 3: Process Connection. If an abstraction operates on primitive data, RASTA establishes a connection to a database that contains those data. If the abstraction is generated from other abstractions, RASTA establishes connections to the processes responsible for each of them. If necessary, outgoing connections to the invoking application are also initiated so that any derived abstractions may be supplied to the application. This process may involve writing abstractions to an external database or supplying them through an inter-process communication mechanism. Eventually, all processes be are connected in one or more abstraction process hierarchies, with each hierarchy mirroring the abstraction hierarchy with which it is dealing. Step 4: Abstraction Generation. After the abstraction process hierarchy is established, RASTA starts the inference task. The processes that generate abstractions only from primitive data start first. They read the required raw data from the specified data source and begin generating temporal abstractions from those data. As each abstraction is generated, it is

Temporal Abstraction Processes

RASTA

Time Stamped Primitive Data

Abstraction Knowledge Base Case IDs

Contexts

Mapping Knowledge Base

Figure 1. The RASTA system. The inputs to the RASTA system are an abstraction knowledge base, a mapping knowledge base, time stamped primitive data, and a set of contexts and case identifiers. RASTA then creates a set of processes that carry out the temporal abstraction task. These processes may be distributed over multiple machines. passed on to the next dependent abstraction process in the hierarchy. That process then uses that abstraction to perform its deduction task for its case using the set of contexts initially supplied to it. This process, in turn, propagates any abstractions that it derives to processes further up in the abstraction process hierarchy. Step 5: Abstraction Assertion. Eventually, all abstractions propagate through each process hierarchy and the abstraction task terminates. At this point, RASTA passes all generated abstractions to the calling application by one of two methods. The first method uses a custom XML data structure to pass the abstractions back to the application that invoked RASTA. Alternatively, abstractions can be inserted directly into a relational database. Before each process terminates, RASTA saves a representation of its current state to a relational database, so that the process can be resurrected again if new data arrive. RASTA does not allow circular dependencies between abstractions or between abstractions and contexts, so it ensures that the abstraction processes eventually terminates. RASTA thus allows parallelization of the knowledgebased temporal abstraction method. Large case sets or deep abstraction hierarchies can be tackled by adding extra processing units. We have ensured that the communication overhead of this parallelization is minor, so that the abstraction task can be distributed without being overwhelmed by communications costs. Additionally, RASTA’s algorithm is data-driven and does not require complicated synchronization between

abstraction processes, a characteristic that simplifies the distribution task.

RASTA’s Implementation RASTA is written in the Java programming language. It uses CORBA as its inter-process communication mechanism, and XML defines the format of the data exchanged between processes. All knowledge bases used by RASTA are written using Protégé-2000 [9], a knowledge base authoring environment developed in Stanford. Protégé-2000 also provides automated assistance in the acquisition of abstraction knowledge from domain experts [10]. To avoid direct dependence on Protégé-2000, we defined an intermediate interface that provides an alternate way of specifying abstraction knowledge. This interface uses XML as its data description language. We also developed a translation program to dynamically map Protégé-2000 knowledge base data structures to the appropriate XML data structure. The XML-based interface provides a portable way for other components to use RASTA. A temporal query system called Chronus II is used to interact with the database [5,6]. Chronus II ensures that all primitive data used by RASTA is time stamped and that this time stamp has a consistent temporal format, both useful features given the absence of a standard way to represent temporal information in relational databases. Deployment Strategies RASTA allows users to tune its deployment to their needs. A number of deployment strategies are possible. Some obvious strategies include: Single Process, Multiple Thread: A set of dedicated threads within a single process carries out each abstraction. This strategy may be suitable for small abstraction sets. Multiple Process, Single Computer: A dedicated process carries out each abstraction, with multiple processes running on the same computer. Multiple Process, Multiple Computer: Abstraction processes are distributed among multiple computers. Multi-CPU Parallel Computer: Abstraction processes are distributed among the CPUs on a parallel computer. An important prerequisite of this deployment configuration is that each process should not have significant resource requirements. Our algorithm satisfies this condition. Variations on these approaches are possible. For example, to minimize the communication overhead, related abstractions can be clustered together in multithreaded processes. This configuration may prove useful in a typical deployed clinical decision support system were multi-CPU servers or large networks of powerful computers are not usually available. With

this configuration, the abstraction task can be distributed among a small network of clinic workstations, which typically have minimal CPU usage. RASTA thus provides a flexible means of configuring the abstraction process to suit the needs of numerous decision support requirements. In a small decision support system performing temporal abstractions for single patients, a single-process multi-threaded deployment configuration might be appropriate. A highly distributed configuration would probably be appropriate for large multi-patient data sets. Truth Maintenance Clinical data often arrive out of temporal order: results from some laboratory tests arrive days or weeks after the test is performed; data may be recorded erroneously in a database and later corrected; new information often causes clinicians to change their opinions about a patient’s earlier state A temporal reasoning system must be able to deal with out-of-order data, and it must be able to do so without regenerating abstractions from scratch, particularly if it is used for large data sets. Triggers associated with primitive data used by the temporal abstraction process are configured to inform RASTA when the data are modified. When a notification occurs, RASTA determines if the processes responsible for the appropriate dependent abstraction and case is already active. If the process is not active, RASTA generates a new process and supplies it with the previously saved state for the last invocation of that process. The new primitive data are then passed to the process, which begins generating abstractions anew. Again, abstractions are propagated through the abstraction process hierarchy, with new abstraction processes being generated by RASTA as necessary. In this case, however, the process may involve retracting abstractions. Like assertions, retractions may also propagate throughout the abstraction process hierarchy and may involve the modification of earlier abstractions. Abstraction retraction can be computationally complex. RASTA must also ensure that knowledge base modifications are reflected in updates to any abstractions that it has generated. To accomplish this task, the system uses a dynamic XML-based link to a Protégé-2000 knowledge base via an active intermediate mapping process. The link enables RASTA to detect changes to the knowledge base immediately. When it detects a change in the knowledge base, RASTA examines the change and decides how much of its abstraction information is affected. Even small changes in the knowledge base can have a significant impact on the abstraction

hierarchy. RASTA then selectively regenerates the affected abstractions. Large-scale changes to the knowledge base may, of course, require regenerating most or all of RASTA’s abstractions; minor changes, however, should have minimal effects. Thus, RASTA ensures that its data and knowledge bases are always synchronized. The abstraction state is saved so that new invocations of RASTA can use previous abstractions, obviating the need to start the abstraction process from scratch each time the system is invoked. The saved state is synchronized with the original source data so that upon re-invocation RASTA can determine if later changes in the data require it to update its abstractions.

Discussion RASTA was designed to act as the basis of a scalable architecture for performing knowledge-driven temporal reasoning with clinical data. It provides a temporal abstraction system in a package that is portable, efficient, and scalable. The algorithm that RASTA uses allows the task of generating abstractions to be distributed over many computers, allowing RASTA to work on very large data sets. The algorithm also supports a variety of configuration options, enabling the system to deal with a range of application requirements. We are currently experimenting with the RASTA system in an EON-based hypertension decision support system called ATHENA, which is deployed at the VA Palo Alto Health Care System [11]. We will soon deploy ATHENA at several other VA sites in the United States, at which time we hope to gain considerable experience in the evaluation of various RASTA deployment strategies. We are also researching the possibility of using RASTA as the temporal reasoning component of decision support systems that are to be used for chemical and biological surveillance [3]. The large data sets typical in these systems should prove a valuable test bed for RASTA.

Acknowledgements This work has been supported, in part, by grant LM05708 from the National Library of Medicine, and by grants from FastTrack Systems, Inc and the Space and Naval Warfare Center. We thank Valerie Natale for her valuable editorial comments.

References 1.

Dolin RH. Modeling the Temporal Complexities of Symptoms. Journal of the American Medical Informatics Association 1995: 2(5): 323-331.

2.

Shahar Y. Context-Sensitive Temporal Abstraction of Clinical Data. In Intelligent Data Analysis in Medicine and Pharmacology Boston: Kluwer Academic Publishers, 1997: 37-59. 3. Khan AI, Sage MJ. Biological and Chemical Terrorism: Strategic Plan for Preparedness and Response. Centers for Disease Control and Prevention April, 2000. 4. Musen MA, Tu SW, Das AK, and Shahar Y. EON: A Component-Based Approach to Automation of Protocol-Directed Therapy. Journal of the American Medical Informatics Association 1996: 3(6): 367-388. 5. O'Connor MJ, Tu SW, and Musen MA. Applying Temporal Joins to Clinical Databases. AMIA Annual Symposium 1999: 335-339. 6. O'Connor MJ, Tu SW, and Musen MA. Representation of Temporal Indeterminacy in Clinical Databases. AMIA Annual Symposium 2000: 615-619. 7. Nguyen JH, Shahar Y, Tu SW, Das A, and Musen MA. Integration of Temporal Reasoning and Temporal-Data Maintenance into a Reusable Database Mediator to Answer Abstract, TimeOriented Queries: The Tzolkin System. Journal of Intelligent Information Systems 1999:13(1/2): 121-145. 8. Shahar Y. and Musen MA. RÉSUMÉ: a Temporal-Abstraction System for Patient Monitoring. Computers and Biomedical Research 1993:26: 255-273. 9. Musen MA, Fergerson RW, Grosso WE, Noy NF, Crubezy M, and Gennari JH. ComponentBased Support for Building KnowledgeAcquisition Systems. Conference on Intelligent Information Processing of the International Federation for Information Processing World Computer Congress Beijing, 2000. 10. Shahar Y, Chen H, Stites DP, Basso L, Kaizer H, Wilson DM, and Musen MA. Semi-automated Entry of Clinical Temporal-abstraction Knowledge Journal of the American Medical Informatics Association 1999:6(6): 494-511. 11. Goldstein MK, Hoffman BB, Coleman RW, Musen MA, Tu SW, Advani A, Shankar R, and O’Connor MJ. Implementing Clinical Practice Guidelines while taking account of Changing Evidence: ATHENA DSS, an Easily Modifiable Decision Support System for Managing Hypertension in Primary Care. AMIA Annual Symposium 2000: 300-304. Address for Correspondence Martin O’Connor Stanford Medical Informatics, Stanford University, MSOB X267, 251 Campus Drive, Stanford, CA 94305-5479, USA [email protected]

Suggest Documents