IPS: A System that Uses Machine Learning ... IPS uses machine learning to help the user construct a model ... a relatively thin C++ application that runs in.
IPS: A System that Uses Machine Learning to Help Locate Patient Records for Clinical Research Gregory F. Cooper, M.D., Ph.D., Bruce G. Buchanan, Ph.D., Wendy Chapman, Ph.D., Paul Hanbury, B.S., Mehmet Kayaalp, M.D., M.S., Melissa Saul, M.S.
University of Pittsburgh Pittsburgh, Pennsylvania The IPS (Identifying Patient Sets) system assists clinical researchers in locating patient electronic records of interest for use in retrospective and prospective studies. Such records can include structured data, such as medication lists and laboratory results, as well as free text data, such as dictated discharge summaies. The IPS system is designed to help researchers locate records that are difficult to find with a simple Boolean query, either because many of the relevant query terms are not easily identifiable or because the concept of interest is inherently complex. For example, it would be difficult to formulate a simple Boolean query that (with high precision and recall) locates the records of patients who had experienced adverse drug reactions in the hospital.
user selects from the list those terms T that he or she believes are clinically meaningful in locating
records of interest. IPS uses the terms in T and the training records in E to construct a simple Bayes model, which is applied to all the records in D (that are not in E). The model rank orders all these records according to the probability they are of interest. IPS makes it easy for the user to examine the most probable records and designate (label) which are actually of interest. The records designated to be of interest are automatically added to E and the process repeats itself. When the user is satisfied that no more records of interest remain (or can be feasibly identified), the process stops. At that point, E contains the records of interest that have been identified.
IPS uses a client-server architecture. The client is a relatively thin C++ application that runs in Windows and provides a user interface. The client communicates over the Internet (using TCP/IP) with a server running under NT. The server contains separate modules for handling I/O, inference, and database tasks. The server can be readily adapted to use different I/O streams and different databases.
IPS uses machine learning to help the user construct a model (query) to locate the patient records of interest. Model construction occurs incrementally within a feedback loop that involves (1) using the current IPS model to suggest records of interest, (2) labeling by the user of those records that are actually of interest, and (3) updating the IPS model based on the user's labeling.
IPS has been used for approximately one year in limited testing by clinical researchers at the University of Pittsburgh. Records of interest that were identified have included adverse drug events, and patients returning to the emergency department because of post-operative pain.
To apply IPS, a user needs a set of records to be searched (S) and a few examples of records of interest (E). At the University of Pittsburgh, the records in S are extracted from the UPMC MARS data repository. The records are then deidentified by IPS to produce a set D of deidentified records. Within text records, for example, the de-identifier electronically blanksout information that might identify a patient, such the patient's name. Set E indicates a few sample records that are of interest. If no such examples are known, IPS helps the user browse D to locate a few initial records of interest; such browsing can involve preliminary Boolean searches. Based on E, IPS derives and lists those attributes (i.e., variable-value pairs) that distinguish records in E from those not in E. The
1067-5027/01/$5.00 © 2001 AMIA, Inc.
This poster will illustrate the major aspects of the IPS system as applied to several patient-record retrievals that are of clinical interest.
813