Automated Text Markup for Information Retrieval from an Electronic

0 downloads 0 Views 167KB Size Report
Automated Text Markup for Information Retrieval from an Electronic. Textbook of Infectious Disease. Daniel C. Berrios, MD, MPH*, Andrew Kehler, PhD t,.
Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease Daniel C. Berrios, MD, MPH*, Andrew Kehler, PhD t, David K. Kim**, Victor L. Yu, MDtt, Lawrence M. Fagan, MD, PhD* *Stanford Medical Informatics, Stanford University School of Medicine, Stanford, CA. tSRI International, Menlo Park, CA., **Mayo Medical School, Rochester, MN. tt Div. of Infectious Disease, University of Pittsburgh School of Medicine, Pittsburgh, PA. ABSTRACT

EVALUATION

The information needs of practicing clinicians frequently require textbook or journal searches.' Making these sources available in electronic fonn improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly.2

We would like to demonstrate that ISAID is accurate and efficient. We will compare the consistency of the markup that the system proposes for each sentence with that proposed by infectious disease experts. We will analyze the hypothesis that ISAID is at least as consistent in its proposed indexing as are MEDLINE indexers with one another. To evaluate efficiency, we have arbitrarily chosen a target markup time of 1/6 of the time needed for manual indexing, or 1/2 hour or less per chapter.

Kim et al. have designed and built a prototype system (MYCIN IIf) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters).

DISCUSSION

We are designing a system that uses existing information-extraction tools in combination with query and text-markup models to allow indexing of medical information relatively quickly and simply. Through faster, more precise access to these knowledge sources, clinicians can reduce the amount of time they devote to information gathering, enhance the quality of information retrieved, and ultimately improve the quality of care they deliver to patients.

We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools4 to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.

References 1. Osheroff JA, Forsythe DE, Buchanan BG, Bankowitz RA, Blumenfeld BH, Miller RA. Physicians' information needs: analysis tf questions posed during clinical teaching. Ann Intern Med 1991; 114(7): 576-81. 2. Hersh WR. Evaluation of Meta-I for a conceptbased approach to the automated indexing and retrieval of bibiographic and full-text databases. Med Decis Making 1991; 11: S 120-S 124. 3. Kim DK, Fagan LM, Jones KT, Berrios DC, Yu VL. MYCIN II: Design and Implementation of a Therapy Reference with Complex Content-Based Indexing. Proceedings of the 1998 AMIA Fall Symposium, Orlando, FL. (in press). 4. Soderland S. Aronow D, Fisher D, Aseltine J, Lehnert W. Machine Learning of Text Analysis Rules for Clinical Records. Technical Report No. TE-39, Center for Intelligent Information Retrieval, U. of Massachusetts, 1995.

SYSTEM

The components of the ISAID system include text parsing, information extraction (IE), and markup The text parser marks the authoring modules. boundaries of text blocks for document navigation and markup, performs multi-word concept recognition, and parses sentences into syntactic buffers. The IE module examines blocks of text and proposes text markup based on an extraction grammar, which in turn, is based on a model cf likely queries. The required semantic and part-ofspeech knowledge is obtained from the UMLS Metathesaurus through its web application programming interface. The markup authoring module allows the indexer to select and navigate a document, and to create, view and/or edit text markup.

1091-8280/98/$5.00 C 1998 AMIA, Inc.

975

Suggest Documents