Evaluating MetaMap's Text-to-Concept Mapping Performance - NCBI

2 downloads 561 Views 140KB Size Report
different applications, an application independent analysis and evaluation oftheir performance is more tailored to the needs ofthe application developer.
Evaluating MetaMap's Text-to-Concept Mapping Performance Henny Klein, PhDt, Marc Weeber, MAt, Lolkje T.W. de Jong-van den Berg, PhDt, Rein Vos, MD,PhDtW'

tGroningen University Institute for Drug Exploration, Department of Social Pharmacy and Pharmacoepidemiology, Groningen, The Netherlands; Department of Health Sciences, University of Maastricht, The Netherlands METHODS

Natural Language Processing (NLP) systems used in the analysis of medical texts are mostly evaluated both as part of a larger system and with respect to a specific goal (document retrieval, decision support). Since NLP systems can be integrated as a module in different applications, an application independent analysis and evaluation of their performance is more tailored to the needs ofthe application developer. We evaluated the performance of the MetaMap system [1] in mapping words in MEDLINE' abstracts to medical concepts present in the UMLS Metathesaurus. More specifically, the effects ofthe distinct user options MetaMap provides are analyzed

Analysis of the mappings per phrase Mapping structure We counted the number of words in the phrases and the number of concepts. Accuracy ofthe mapping We counted the number of correct concepts, and the number of correctly mapped words per concept. Evaluation Measures used to evaluate MetaMap's performance: precision

METAMAP

N of correctly mapped words * =100 N of words in phrases N of correctly mapped words agtion * 100 N of correct concepts

coverage

MetaMap maps full texts to UMLS Metathesaurus concepts. The system uses underspecified syntactic analysis, lexical variant generation (abbreviation expansion, synonymy, derivational and inflectional morphology) based on the SPECIALISTTm Lexicon, and word order variation. A sentence is parsed into consecutive processing phrases, which are mapped to one or more concepts (for an example, see table 1). User options for the mapping process: G allow concept gaps I use free word order D prevent derivation U generate unique acronyms and abbreviations A generate no acronyms and abbreviations The possible combinations of these options lead to 24 different settings. The performance of MetaMap in these settings was tested on 100 randomly chosen MEDLINE abstracts, with a total of 13426 words.

RESULTS MetaMap produced 7239-8763 concepts. Averages: precision 94.35, coverage 63.55, aggregation 1.147. Main effects of the options and their combinations G lower precision; higher coverage together with I I higher precision as well as coverage D higher precision and aggregation, low coverage U/A higher precision (A slightly better than U) Best choices AI, A and GAi give the best results. Al and A give high precision, GAi offers better coverage. Option A offers a relatively high performance on all three parameters. Table 2: Measures per option setting, best results coverage aggregation precision options Al 1.125 95.53 66.14 A 1.157 66.01 95.34 1.122 68.30 GAI 94.87

Table 1: Phrase mapping of acute encephalitic state Mapping Options Concept 2 Concept I encephalitis NONE, 1, U. UI. A, Al acute -

D, ID, UD, UID, AD, AID

acute

G, GI, GU. GUI, GA, GAI

acute confusional state

encephalitis

GD, GID, GUD, GUID. GAD, GAID

acute confusional state

-

1091-8280/99/$5.00 ©) 1999 AMIA, Inc.

N of correct concepts N of concepts

=* 100

1. Aronson AR. The effect of textual variation on concept based information retrieval. In: Cimino JJ, editor. Proceedings of the 1996 AMIA Annual Fall Symposium. Philadelphia, PA: Hanley and Belfus; 1996; p. 373-377.

1 101