Using RxNorm and NDF-RT to Classify Medication Data Extracted ...

Using RxNorm and NDF-RT to Classify Medication Data Extracted from Electronic Health Records: Experiences from the Rochester Epidemiology Project Jyotishman Pathak, PhD1 Sean P. Murphy1 Brian N. Willaert1 Hilal M. Kremers, MD1 Barbara P. Yawn, MD, MSc2 Walter A. Rocca, MD, MPH1 Christopher G. Chute, MD, DrPH1 1 Mayo Clinic, Rochester, MN 2Olmsted Medical Center, Rochester, MN Abstract RxNorm and NDF-RT published by the National Library of Medicine (NLM) and Veterans Affairs (VA), respectively, are two publicly available federal medication terminologies. In this study, we evaluate the applicability of RxNorm and National Drug File-Reference Terminology (NDF-RT) for extraction and classification of medication data retrieved using structured querying and natural language processing techniques from electronic health records at two different medical centers within the Rochester Epidemiology Project (REP). Specifically, we explore how mappings between RxNorm concept codes and NDF-RT drug classes can be leveraged for hierarchical organization and grouping of REP medication data, identify gaps and coverage issues, and analyze the recently released NLM’s NDF-RT Web service API. Our study concludes that RxNorm and NDF-RT can be applied together for classification of medication extracted from multiple EHR systems, although several issues and challenges remain to be addressed. We further conclude that the Web service APIs developed by the NLM provide useful functionalities for such activities.

Introduction Standardized biomedical terminologies play an important role in enabling consistent representation and interoperability of healthcare data and information systems3. Within the realm of pharmaceutical drugs and medications, RxNorm1 and NDF-RT2 are two publicly available federal medication terminologies. The goal of RxNorm is to let various systems using different drug nomenclatures share and exchange data efficiently. It provides a way to link and map standard clinical drug names to various drug terminologies commonly used in pharmacy management and drug interaction software, including First DataBank4, Micromedex5, and Multum6. While it does provide extensive coverage for drug entities, RxNorm does not at present offer clinical researchers a sensible way to aggregate or classify clinical drugs or active ingredients for analysis. NDF-RT, on the other hand, includes information about drugs and ingredients, but also contains a multi-axial hierarchical knowledge structure that classifies various ingredients and drug products. Consequently, several research efforts in the recent past, including our own prior work7,8, have studied different aspects of mappings between RxNorm and NDF-RT for classification of medication data and their applications in information exchange9, linkage10 and querying11. The primary objective of this study is to investigate the applicability of RxNorm and NDF-RT for representation and classification of medication data extracted from Electronic Health Record (EHR) systems at two different medical centers participating in the Rochester Epidemiology Project (REP12,13): Mayo Clinic and Olmsted Medical Center (OMC). In particular, we develop approaches for structured and unstructured querying (via natural language processing (NLP) of clinical notes) of out-patient EHR data at OMC and Mayo Clinic, respectively, to extract patient medication information and represent it using RxNorm. This RxNorm coded data is then classified using drug classes from NDF-RT by leveraging the recently released NLM's NDF-RT Web service Application Programming Interfaces (API)14. Our evaluation of these two terminology standards addresses the following aspects: (1) Coverage: What percentage of REP medication data can be adequately classified using NDF-RT drug classes? (2) Representation: What are the NLP gaps and challenges in extracting and representing the medication data using RxNorm? (3) Accessibility: What are the benefits and issues in using NLM’s NDF-RT Web services API? (4) Practical applications and requirements: What are some of the practical needs for drug classifications, at least within the context and scope of REP, and how RxNorm and NDF-RT can address those requirements? Our results indicate that RxNorm and NDF-RT can be applied together for classification of medication data extracted from multiple EHR systems using structured queries and natural language processing techniques. However, there are several issues and challenges with respect to coverage and mapping between these two terminologies that necessitate further investigation. Furthermore, our study evaluated NLM’s Web services APIs for accessing and querying RxNorm and NDF-RT, and concluded that the functionality provided by the APIs are robust and immensely useful for building integrative software applications.

1089

Background RxNorm. RxNorm is a nomenclature for clinical drugs produced by the U.S. National Library of Medicine (NLM). It contains the names of prescription and many nonprescription formulations approved for human use (primarily in the USA). An RxNorm clinical drug name reflects the active ingredients, strengths, and dose form comprising that drug. When any of these elements vary, a new RxNorm drug name is created as a separate concept identified by a concept unique identifier (RxCUI). Consequently, to distinguish between such drug entities, RxNorm uses ‘term types’ (TTYs) that represent categories for generic and branded drugs (Figure 1). Specifically, RxNorm uses five categories for generic drugs: ingredient alone (Ingredient denoted by IN), ingredient that may or may not be clinically active (Precise Ingredient denoted by PIN), ingredient plus strength (Clinical Drug Component denoted by SCDC), ingredient plus dose form (Clinical Drug Form denoted by SCDF) and ingredient plus strength and dose form (Clinical Drug denoted by SCD). Analogously, there are four categories for brand name drug concepts: brand name alone (Brand Name denoted by BN), brand name plus strength (Branded Drug Component denoted by SBDC), brand name plus dose form (Branded Drug Form denoted by SBDF), brand name plus strength and dose form (Branded Drug denoted by SBD). These RxNorm entities are related to each other by a well-defined set of named relationships that allows the traversal of the RxNorm graph and retrieval of relevant information. For example, as shown Figure 1, Childrens Tylenol (BN) and Acetaminophen (IN) are linked via the “tradename_of” and “has_tradename” relationships. Furthermore, RxNorm contains mappings from its concepts to one or more concepts in external drug terminologies and databases, such as First DataBank and Multum. As of 2008, NLM provides the RxNaV15 Web services to access RxNorm information. NDF-RT. Similar to RxNorm, NDF-RT includes information about drugs and ingredients, but also contains a multiaxial hierarchical knowledge structure for drug classes. In particular, NDF-RT uses a description logic-based formal reference model that groups drugs and ingredients into the high-level classes for Chemical Structure (e.g., Acetanilides), Mechanism of Action (e.g., Prostaglandin Receptor Antagonists), Physiological Effect (e.g., Decreased Prostaglandin Production), drug-disease relationship describing the Therapeutic Intent (e.g., Pain), Pharmacokinetics describing the mechanisms of absorption and distribution of an administered drug within a body

Figure 1 Mappings between RxNorm and NDF-RT (blue dotted arrows; figure adapted from1,2)

1090

(e.g., Hepatic Metabolism), and legacy VA-NDF classes for Pharmaceutical Preparations (VHA Drug Class; e.g., Non-Opioid Analgesic). Figure 1 shows a graphical representation of the underlying NDF-RT information model where the hexagons represent multiple-inheritance reference hierarchies and the rectangles correspond to named sets of concepts each representing a level of abstraction used to describe medications. As shown in Figure 1, the mappings between RxNorm and NDF-RT are specified primarily between ingredients and clinical drugs (indicated by blue dotted arrows). Creation of such linkages at the ingredient level versus the clinical drug has ramifications with respect to the overall mapping between RxNorm and NDF-RT concepts, as well as on retrieval of result sets when queried. There are several reasons for this discrepancy. Foremost, RxNorm and NDF-RT represent different terminologies maintained by different organizations. Whilst they both contain drug ingredients and packaged drugs (and can be linked by each), the boundaries of content scope are different. Furthermore, until recently, RxNorm and NDF-RT had different update cycles with resulted often in the content becoming out-of-sync with each other. Secondly, because the relationship between drugs and ingredients is developed separately for each, there are many issues with respect to the maintenance and curation of the linkages—a topic that we investigate further in this study. Rochester Epidemiology Project. The Rochester Epidemiology Project (REP12,13) is a collaborative effort between several healthcare providers in Olmsted County, MN that enables a unique patient medical records-linkage system encompassing the care delivered to residents of Olmsted county, including the City of Rochester. Funded by the U.S. National Institutes of Health for over 40 years, REP provides a very unique “population-based” research facility by enabling the medical providers in Olmsted County, MN to share patient medical records and providing access to accurate incidence data for almost any serious condition to support population-based analytic studies for disease causes and outcomes. Since the inception of this project in 1966, REP has supported a wide array of epidemiological and clinical research studies that have resulted in more than 1800 peer-reviewed manuscripts and publications. For this study, we investigate RxNorm and NDF-RT for representation and classification of REP medication data to facilitate data exchange and interoperability. In particular, using Mayo’s cTAKES platform16, we developed natural language techniques to map and standardize drugs and medication data extracted from the EHR systems of two REP providers using RxNorm. The RxNorm encoded data is then classified using NDF-RT drug classes by leveraging NLM's NDF-RT Web services API. Figure 2 shows a high-level description of our process, and we discuss the details of our approach and findings in the remainder of this paper.

Figure 2 Rochester Epidemiology Project medication data extraction workflow Related Work. In the recent past, several research reports have highlighted important issues and challenges in using NDF-RT for clinical research and interoperability. Bodenreider et al.17 focused on determining anti-coagulation status of patients based on a list of medications prescribed using NDF-RT as the underlying drug class terminology. In particular, this work concentrated on leveraging description logics (DL)-based representation of NDF-RT to infer additional information about drug-class relationships using the Legacy VA classes and External Pharmacologic classes. During this process, the authors had to not only make significant modifications and re-engineering to NDFRT’s DL representation, but also encountered several missing drug-class membership information in NDF-RT. In another study, Palchuk et al.11 constructed a hierarchy of NDF-RT drug classes with drug and medication information from another standardized drug terminology, RxNorm, using data from patient’s electronic medical record. Similar to Bodenreider et al.17, here authors had to perform significant re-engineering to map, and

1091

subsequently classify, RxNorm drug products using Legacy VA classes from NDF-RT. The authors found this process to be extremely onerous, and proposed the evolution of RxNorm towards an interface terminology with hierarchical and categorical organization. In our own prior work7,8, we investigated similar issues in mapping and classifying drugs and medication products from RxNorm using NDF-RT’s multi-axial classification. We found several issues where the mappings were incomplete, and in many occasions, semantically and clinically inconsistent. In summary, there is a growing interest and need within the clinical research and informatics community to leverage federal medication terminologies such as RxNorm and NDF-RT, and the overarching goal of this study is to investigate and highlight some of the existing issues and challenges within the context of extracting medication data from multiple EHR systems.

Materials and Software. The following materials and software were used for this study: •

RxNorm January 3rd, 2011 Full Update release data that is consistent with 2010AB UMLS Metathesaurus. This dataset included 19,057 active SCD types, 16,370 active SBD types, 4,943 active IN types, 1,547 active PIN types, 14,803 active BN types, and 100 active DF types.

•

NDF-RT February 7th, 2011 release data that is consistent with the RxNorm January 3rd, 2011 release. This dataset comprised a total of 45,149 NDF-RT concepts, including drug, ingredients, dose forms, drug classes, and so on.

•

Medication history, as part of out-patient clinical notes (referred to as Orders97), for 212,974 patients retrieved from the Mayo Clinic’s EHR system between January, 2004 and October, 2010. Due to the unstructured nature of clinical notes text, natural language processing techniques were applied to extract the medication information (more details in Section 4). The dataset contained more than 180,000 unique mentions of drugs and medications, and was retrieved by processing approximately 5 million rows of data from out-patient clinical notes.

•

Out-patient prescription data for 105,151 unique patients retrieved from OMC’s EHR system (based on Microsoft® SQL Server), between August, 2002 and November, 2010. The dataset contained more than 1375 unique mention of drugs and medications, and extraction of the medication information was primarily done via SQL queries.

•

February 9th, 2011 release of the NLM’s RxNav Web services API.

•

February 16th, 2011 release of the NLM’s NDF-RT Web services API.

Methods As mentioned above, Figure 2 shows a high-level graphical representation of our REP medication data processing workflow. Simplistically, for retrieving medication data from OMC’s EHR systems, we designed a set of SQL queries. OMC codes this data using a commercial drug database, Multum. Hence, we used mappings between RxNorm and Multum made accessible via the RxNav API to represent OMC’s data using RxNorm codes. For retrieval of medication data from Mayo’s EHR systems, we leveraged the cTAKES16 NLP platform to implement natural language processing techniques for processing unstructured clinical notes from Orders97. This data was represented in RxNorm as well through the RxNav Web services. The entire collection of RxNorm encoded REP data was then categorized under NDF-RT drug classes by invoking NLM’s NDF-RT Web services API. We describe more details about this process next. Using Structured Querying for Extracting Olmsted Medical’s Medication Data. Prescription data for 105,151 unique patients Olmsted Medical Center's (OMC) EHR system (InteGreat IC-Chart) was retrieved through a scheduled job that queries the data directly from the EHR source database tables (stored in a Microsoft® SQL Server database). The data ranged between August, 2002 and November, 2010, and included detailed prescription information such as medication name and description, Multum drug code, NDC number, strength, form, route, quantity, units, frequency, refills, dates, and prescriber. This data was supplemented with drug brand and generic names via a drug lookup table derived from Multum. For representing this data using RxNorm, we simply mapped the Multum codes to RxCUIs (RxNorm Concept Unique Identifiers) using the RxNav API. For example, Hydrogen Peroxide 300 MG/ML Topical Solution with a Multum code=16282 was mapped to RxCUI=91348. Using Natural Language Processing for Extracting Mayo Clinic’s Medication Data. To extract drug mentions from Mayo’s Orders97 clinical notes data, we first created a dictionary comprising more than 265,000 terms,

1092

Figure 3 cTAKES NLP pseudocode for processing Orders97 clinical notes data identified uniquely via an RxCUI code, to assist in the look-up process using RxNorm. This dictionary also comprised of RxNorm codes that were deemed as “obsolete” by the NLM since the patient corpora included more than a decade old drug information. Furthermore, we supplemented the dictionary with 1717 additional terms primarily comprising drug misspellings and abbreviations that were not available in RxNorm. We used open-source Apache Lucene (http://lucene.apache.org) for implementing the dictionary along with the RxNav Web services API. For the extraction process, we used Mayo’s open-source cTAKES16 NLP platform. In particular, only those terms from the Orders97 data that were composed of three tokens or less were considered for a match in the dictionary. (A token is considered any text surrounded by blanks with the exception of punctuation characters, which are considered as tokens as well.) Note that we experimented with increasing the number of tokens considered to seven, although our analysis concluded that the differences in identifying appropriate drug mentions between three versus seven was negligible. In fact, limiting the number of permutations of tokens to three helped minimize the computation time necessary for text processing. Each term in the dictionary was run through a tokenizer process that separates the terms into distinct tokens. Figure 3 shows a pseudocode for this process to automatically discover drug mentions and relevant attributes, such as dosage, route, form, frequency, duration, and drug change status. Classification using NDF-RT. Once the data from the EHR sources at Mayo Clinic and OMC were represented using RxNorm, our next step was to classify and categorize the medication information using NDF-RT drug classes. Figure 4 shows the pseudocode of our algorithm that achieves this goal leveraging NLM’s RxNav and NDF-RT Web service API calls. Simplistically, our objective was to classify the REP drug and medication data using NDFRT’s legacy VHA Drug Classes (refer to Figure 1). The rationale behind this decision is derived from our discussions with clinician experts participating in REP who concluded that the legacy classes provide useful information to support organization and decision support for medication usage in a clinical care setting. The algorithm in essence is based on exploring the linkages between the clinical drug and ingredient concepts of RxNorm and NDF-RT (as indicated by the blue dotted lines in Figure 1). However, such a traversal is not always trivial due to issues around misspelled drug names, lack of explicit relationships between RxNorm term types, as well as, lack of coverage across both the drug terminologies. As evidenced by our previous work7, approximately 54% of RxNorm drug concepts (Semantic Clinical Drug) did not have any correspondence to a NDF-RT clinical drug concept, and approximately 45% of drug concepts in NDF-RT are missing in RxNorm, most of which were attributed to differences in dosage, strength and route form. While these numbers are based on 2008 releases of RxNorm and NDF-RT, nevertheless demonstrate significant issues for cross-linkages between both terminologies. Taking these aspects into consideration, our algorithm adopts a 2-stage approach: in the first stage, it traverses the direct linkages between RxNorm SCD and IN concepts to corresponding clinical and drug and ingredient concepts in NDF-RT for identifying an appropriate VHA Drug Class. However, if this step fails either due to lack of mappings, or corresponding concepts, Stage II of the algorithm is pursued. In this second stage, the algorithm leverages chemical ingredient(s) information available for a particular drug product to assign NDF-RT drug classes.

1093

Step 1: For a given RxCUI, find the re-mapped concept identifiers using the “findRemapped” RxNav API call a. If more than one identifier is returned, use the last identifier. Step 2: Query the RxCUI against “getAllRelatedInfo” via RxNav API to retrieve the RxNorm properties a. If empty set, then use “getSpellingSuggestions” via RxNav API to find mentions related to full drug text designation I. Use “findRxcuiByString” via RxNav API to retrieve new drug name II. If empty set, use brand name extracted from full drug text if available via “getAllRelatedInfo” in the RxNav API i. If still empty, go to Step 2 above using the first word in drug name ii. Else capture concept node entry a. Else if not empty, capture concept node entry Step 3: Check contents of concept node entry: a. If entry exists for “SDC”, “SBC”, add to list fitByDrugPackList b. If entry exists for ”SCDC”, “PIN”, add to list fitByIngredientList c. If entry exists for “DF”, store value as drugForm Step 4: For each entry in fitByDrugPackList: a. Use “findConceptsByID” in the NDF-RT API to retrieve NDF-RT data i. For each NDF-RT concept node (NUI) returned, invoke findVAClass i. If no ndfrtCode found, use fitByIngredientList using logic at Step 4 above passing drugForm to findVAClass findVAClass: Step 1: Get list of concept properties for a given NUI via “getConceptProperties” in the NDFRT API a. For each node returned, use “getAllInfo” in the NDF-RT API I. If the node is at “VA Product” level, find all the child and sibling nodes II. For each node, get a list of concept properties via “getConceptProperties” query in the NDF-RT API i. If “VA Class” is available, capture ndfrtCode

Figure 4 Pseudocode for NDF-RT drug class assignment using NLM's RxNav and NDF-RT APIs In particular, for a given drug product in RxNorm, this stage first identifies all the RxNorm and NDF-RT ingredient concepts for the drug product. The method then determines the drug product(s) in NDF-RT that contain only those NDF-RT ingredient concepts identified from the first step by traversing the child and sibling nodes in the hierarchy, and extracts the corresponding VHA Drug Classes. We illustrate the above steps via an example as follows: the following drug mention text 12 HR Contac 12 Hour Allergy 1 MG Extended Release Tablet corresponds to the RxCUI 598047 in RxNorm. However, this is an obsolete RxCUI. Hence, via "findRemapped" RxNav API call, the algorithm will find the newly remapped RxNorm code, which in this case, resolves to RxCUI 857421 (Clemastine Fumarate 1.34 MG [Contac 12 Hour Allergy]). This code has the RxNorm term type of SBD (Semantic Branded Drug), and hence does not have a direct linkage to a NDF-RT clinical drug concept. Its corresponding SCD (Semantic Clinical Drug) concept is 12 HR Clemastine Fumarate 1.34 MG Extended Release Tablet (RxCUI 857420), which has a linkage to an NDF-RT clinical drug concept, but is nonetheless not classified under a VHA Drug Class. Consequently, Stage II of the algorithm identifies the PIN (Precise Ingredient - Clemastine Fumarate - RxCUI 142430) and SCDC (Semantic Clinical Drug Component Clemastine Fumarate 1.34 MG - RxCUI 857400) as a means to check to see if any sibling or child nodes containing the same ingredient form can be leveraged for classification. In this case, the SCD Clemastine 1 MG / Phenylpropanolamine 75 MG Extended Release Tablet (RxCUI 857404) returns as a match, and is grouped under the VHA Drug Class [RE501] ANTIHISTAMINE/DECONGESTANT. Consequently, the technique assigns the same drug class to 12 HR Contac 12 Hour Allergy 1 MG Extended Release Tablet. Results Using RxNorm for representing REP medication data. Given that for OMC medication data, we simply performed SQL queries, our main focus was to evaluate the accuracy and consistency of mapping between RxNorm

1094

and Multum. To this end, we randomly selected 500 drug mentions from top 50 frequently administered drugs in the OMC data that were coded in Multum, and manually evaluated the mapping to RxCUIs done via the RxNav API. An experienced pharmacist, specifically focusing on medication names and their descriptions, did this evaluation. Our findings concluded that the entire set of 500 mappings were accurate, thereby validating that the curation of mappings, at least between RxNorm and Multum, is of high quality and consistency. DTaP/IPV/Hib Vaccine Haemophilus Influenzae Type b (Hib) Vaccine H1N1 swine flu vaccine Meningococcal Vaccine Inactivated polio vaccine MMRV (measles, mumps, rubella varicella) vaccine Typhoid vaccine Hepatitis A vaccine Fluvax vaccine Table 1 Sample list of vaccines missing from RxNorm

Processing of Orders97 data from Mayo’s EHR was more involved. Specifically, we extracted medication information for 212,974 unique patients using the cTAKES NLP platform, and mapped drug and medication mentions to RxCUIs using the RxNav API. We analyzed 4,964,022 rows representing drug related mentions for 212,974 different patients in the Orders97 data (there were one or more mentions on each row). 181,727 unique terms were identified as valid drug "related" mentions by the cTAKES pipeline, although many included terms that were not valid drug related mentions (e.g., Eyeglasses, Hospital bed). This issue can be attributed to the design and implementation of the cTAKES medication extraction module as well as due to the intrinisic nature of how Orders97 stores data. Furthermore, we found that even though RxNorm provides a vast coverage of commonly prescribed drug and medications, its coverage for commonly administered vaccines are not robust.

As an example, Table 1 shows a sample list of vaccines for which no corresponding RxNorm codes were discovered while processing the Orders97 data (this list was manually identified from Orders97). Specifically, we identified 103 distinct vaccine related terms in our Orders97 dataset, of which 35 had recognizable terms in RxNorm, although variations of those terms were missing from RxNorm. For example, the text span “H1NI” produced no hits, but the text span “Influenza A (H1N1) Vaccine 2009” had a corresponding RxCUI. In addition to above, our investigation also highlighted that drug and medication terms commonly occurring in the Orders97 dataset which were represented using abbreviations, hyphenations, or aliases (e.g., “vit. b3” to represent “Vitamin B3”) did not have a corresponding RxNorm code. Using NDF-RT for classifying REP medication data. Irrespective of the number of permutations used to match a drug name extracted along with the signature elements for the cTAKES NLP medication data processing, we concluded that it is highly unlikely that the entire entity will be discovered for all data instances. Since the system inherently suffers from the relatively low success rate of capturing additional drug elements to enhance the accuracy and precision of the RxNorm-NDFRT classification, our focus was to build a better named-entity capturing mechanism that uses normalized terms and relies on the large vocabulary of terms. The drug named-entity recognition (drug NER) annotator pipeline developed at Mayo (available via cTAKES) attempts to extract seven signature elements including dosage, route, and form information. It utilizes the 2010AB RxNorm release at two different stages while annotating the medication Orders97 data: 1.

A means to tag the named-entity using more generalized versions of the drug names from RxNorm.

2.

To introduce the normalized forms of the drug captured by the drug NER annotator and build a query string to retrieve more specific versions of the drug mention that contain NDF-RT classification information. Using this approach the system was able to classify more than 1/3 of the entire Orders97 medication data.

Table 2 shows the distribution of NDF-RT drug classes (only VHA Drug Classes) for the entire REP medication data evaluated in this study. We use the term ‘Drug_Class’ to indicate the fact that RxNorm term types SCD and/or SBD played a role in determining the appropriate NDF-RT class, whereas ‘Generic_Class’ indicates that RxNorm term types IN and PIN were used by our algorithms for classification. As indicated, approximately 48% of REP drug terms with RxCUIs were assigned a VHA drug class via direct linkages between RxNorm and NDF-RT, whereas, an additional 43% REP drug terms were assigned indirectly (i.e., via Stage II of the algorithm).

1095

Unique NDF-RT drug classes (used for classification) Drug_Class Generic_Class

Unique REP drug Unique REP drug % of RxCUIs with terms with NDF-RT terms without NDF-RT drug drug class NDF-RT drug class assignment assignment class assignment 347 33,332 35,917 48% 237 29,155 37,584 43% Table 2 Distribution of NDF-RT classes for REP medication data

Discussion Mappings between RxNorm and NDF-RT concepts. As illustrated earlier in Figure 1, the Semantic Clinical Drug (SCD) concepts in RxNorm has linkages to Clinical Drug concepts in NDF-RT. The primary challenge with this is that mapping NDF-RT classes to drug mentions found in either unstructured or even semi-structured text for that matter, is the necessity of other "signature elements" required to determine that mapping. Signature elements of a drug mention are those descriptive terms that make up a clinical prescription, and include drug form, route and strength. Just the drug names alone, regardless if they are based on trade, generic, or brand terms are not enough to base classifications, since the ingredient that makes up drug can take on roles based on how it is prescribed. Therefore, dosage information including drug form, route and strength are necessary to determine the intended use for a particular application. This leads to several gaps between the vocabularies of drugs (and their respective drug signature elements) in relationship to the possible NDF-RT class. As a consequence, even though RxNorm represents a broad coverage of clinical drug mentions, ranging from simple ingredient names to long complex drug cocktail descriptions which include multiple phrases delimited by forward slashes containing drug signature elements for each specific drug, a large portion of these RxNorm concepts will not have a corresponding NDF-RT classification provided due to the missing signature drug elements. For instance, there are cases when an RxNorm Semantic Branded Drug (SBD) concept will have a mapping to a NDF-RT VHA drug class, but not the corresponding Semantic Clinical Drug (SCD) concept. As an example, SBD (RxCUI=727413) Twinject 0.3 MG per 0.3 ML Auto-Injector has an NDF-RT classification, but the corresponding SCD (RxCUI=727345) 0.3 ML Epinephrine 1 MG/ML Prefilled Syringe does not. We believe this is a limitation that needs further consideration. In addition to the above, we observed that the obsolete terms in RxNorm mainly impact SCD and SBD concepts. As of February 2nd, 2011 approximately 42% and 20% SCD and SBD concepts, respectively, are designated as obsolete in the RxNorm database. Since these obsolete RxNorm drug concepts make up a large percentage (approximately 68%) of the overall RxNorm concepts, we argue that a means to tie these obsolete concepts to existing terms is necessary, particularly in the context of retrospective longitudinal clinical data coded with RxNorm, and classified using NDF-RT drug classes. Furthermore, as evidenced in our prior work7, a key factor limiting the classification of drugs using RxNorm and NDF-RT is the lack of mappings between RxNorm SCD’s and NDF-RT clinical drugs. While in this study we devised an alternative approach to overcome this issue (via Stage II of the class assignment algorithm) and improve the total percentage of REP terms classified using appropriate NDF-RT drug classes (43% for Generic_Class; Table 2), ultimately the robustness of class assignment is dependent on the robustness and accuracy of the mappings between both the terminologies. We plan to investigate this topic more rigorously in future. Accessibility via NLM’s RxNorm and NDF-RT Web services API. NLM provides downloadable SQL scripts to install RxNorm using a relational database, such as MySQL or Oracle. Although the a SQL database offers a complete reference to the updated drug and form information, our experience suggests that there is no easy way to implement the features offered by the RxNav Web services API. Specifically, the relationship atoms for drug form, precise ingredient (PIN), and trade_name_of/has_trade_name require implementation of complicated SQL queries. Hence, for all practical purposes, we were inclined to leverage the RxNav Web services primarily due to ease of use for this study, which are offered both as SOAP-based or REST-based services. However, at the time of writing this manuscript, the source code for the API is not open-source. Consequently, the end-user has no control over its

1096

behavior (e.g., accessing a particular version of RxNorm release), ability to incorporate new or modify of existing functionality, or face a potential risk for denial of service. Similar to the RxNav API, SOAP-based and REST-based services are provided by NLM for querying NDF-RT. Although, the SOAP services are relatively straightforward to implement (client side querying), they also requires more time to return results. This proved to be a bottleneck for us in processing high-throughput Orders97 data for RxNorm coding. For example, at approximately one second per medication, it took about a week of continuous processing time. The REST-based approach while more complex to set up initially, provides better transaction processing options more conducive to batch processing. We further experienced that the availability of the REST-based service was prone to intermittent drops in server uptime that would cause interruptions in the processing. Typically, suspending the thread for a small period of time and resending the request would circumvent these issues. Occasionally, however, there were instances of extended unavailability at which times our querying process would utilize the SOAP-based equivalent functions for 100 iterations, after which the system would attempt to return the REST-based counterpart. Figure 4 above for the algorithm provides the pseudocode using the SOAP-based API calls. NLM has provided additional information regarding mapping between the SOAP and REST services functions and API calls in their website14. Complexity and Usability of NDF-RT. Similar to our earlier investigation8, the REP investigators found NDF-RT complex to navigate and use without prior guidance and training. In particular, issues such as lack of metadata annotations to the NDF-RT drug classes came to the forefront. As an example, the Legacy VHA drug class “Loop Diuretics” and External Pharmacologic class “Loop Diuretic” are only distinguished by a slight difference in the label name (apart from the concept unique identifier), without additional annotation indicating their differences, similarities etc. Consequently, someone unfamiliar with NDF-RT multi-axial classification runs the risk of using the incorrect classification for her application. We believe that such issues should be addressed in future NDF-RT releases following best practices of terminology and vocabulary development18. Limitations and Future Work. In this study, we limited our investigation to VHA drug classes within NDF-RT. It remains to be seen how other NDF-RT axes (e.g., Mechanism of Action, Pharmacokinetics) can be used to classify and aggregate medication information to study how it relates to patient data and outcomes for REP investigators. Additionally, this study only evaluated drug classification from one terminology: NDF-RT. In the future, we plan to expand our investigation by incorporating other publicly available terminology resources, such as SNOMED-CT for drug classification, which can be queried through the RxNav Web services API.

Conclusion RxNorm and NDF-RT are publicly available federal medication terminologies. In this work, we studied how structured and natural language processing can be applied for extracting medication data from two different EHR systems in the Rochester Epidemiology Project, and represented and classified using terms and concepts from RxNorm and NDF-RT. For this entire process, we leveraged recently released Web services API by NLM for accessing and querying RxNorm and NDF-RT. Our investigation identified potential limitations of the existing classification system, as well as various issues in specification of correspondences between the concepts in RxNorm and NDF-RT. Our proposals and methods provide the preliminary steps to addressing some of the issues.

Acknowledgment. This research is funded in part by the Rochester Epidemiology Project (AG034676-45) and Mayo Clinic Early Career Award to the first author.

References 1. 2.

Liu S, Ma W, Moore R, Ganesan V, Nelson S. RxNorm: Prescription for Electronic Drug Information Exchange. IT Professional. 2005;7(5):17--23. Brown S, Elkin P, Rosenbloom T, et al. VA National Drug File Reference Terminology: A CrossInstitutional Content Coverage Study. MedInfo: Studies in Health Technology and Informatics. 2004:477781.

1097

3.

4. 5. 6. 7. 8. 9.

10. 11. 12. 13.

14. 15. 16.

17. 18.

Bodenreider O. Biomedical Ontologies in Action: Role in Knowledge Management, Data Integration and Decision Support. In: Geissbuhler A aKC, ed. IMIA Yearbook of Medical Informatics. Vol 47: International Medical Informatics Association; 2008:67-79. First DataBank. http://www.firstdatabank.com/. Accessed February 23, 2011, 2011. Micromedex. http://www.micromedex.com. Accessed February 23, 2011, 2011. Cerner Multum. http://www.multum.com/. Accessed February 23, 2011, 2011. Pathak J, Chute C. Analyzing categorical information in two publicly available drug terminologies: RxNorm and NDF-RT. Journal of American Medical Informatics Association. 2010;17(4):432-439. Pathak J, Richesson R. Use of Standard Drug Vocabularies in Clinical Research: A Case Study in Pediatrics. American Medical Informatics Association (AMIA) Annual Symposium2010:607-611. Bouhaddou O, Warnekar P, Parrish F, et al. Exchange of Computable Patient Data between the Department of Veterans Affairs (VA) and the Department of Defense (DoD): Terminology Mediation Strategy. Journal of the American Medical Informatics Association. 2008;15(2):174-183. Burton MM, Simonaitis L, Schadow G. Medication and Indication Linkage: A Practical Therapy for the Problem List? AMIA Annual Symposium. 2008:86-90. Palchuk M, Klumpennar M, Jatkar T, Zottola R, Adams W, Abend A. Enabling Hierarchical View of RxNorm with NDF-RT Drug Classes. AMIA Annual Symposium. Washington, DC2010:577-581. Melton L. Hisotry of the Rochester Epidemiology Project. Mayo Clinic Proceedings. 1996;71(3):266-274. St. Sauver J, Grossardt B, Yawn B, Melton J, Rocca W. Use of a Medical Records Linkage System to Enumerate a Dynamic Population Over Time: The Rochester Epidemology Project. American Journal of Epidemiology. 2011. National Library of Medicine NDF-RT Web Services API. Last updated on: February 16th, 2011. http://rxnav.nlm.nih.gov/NdfrtAPI.html. Accessed March 16, 2011, 2011. Zeng K, Bodenreider O, Kilbourne JT, Nelson SJ. RxNav: Towards an integrated view on drug information. Medinfo. 2007:P386. Savova G, Masanz J, Ogren P, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of American Medical Informatics Association. 2010;17(5):507-513. Bodenrieder O, Mougin F, Burgun A. Automatic Determination of Anticoagulation Status with NDF-RT. 13th ISMB Special Interest Group Meeting on Bio-Ontologies2010:140-143. Cimino J. Desiderata for Controlled Medical Vocabularies in the Twenty-First Century. Methods of Information in Medicine. 1998;37(4-5):394-403.

1098

Using RxNorm and NDF-RT to Classify Medication Data Extracted ...

Using RxNorm and NDF-RT to Classify Medication Data Extracted ...

Suggest Documents

Learning to Classify Texts Using Positive and Unlabeled Data

Using bioinformatic and phylogenetic approaches to classify

Using High Spatial Resolution Multispectral Data to Classify ... - asprs

Using High Spatial Resolution Multispectral Data to Classify ... - asprs

Data Mining Algorithms to Classify Students.pdf - SCI2S

Data Mining Algorithms to Classify Students - International ...

Using Wavelets to Classify Documents - IEEE Xplore

Using p16 immunohistochemistry to classify ...

Data-Mining Techniques to Classify Microarray Gene Expression Data ...

Combining forest structure data and fuel modelling to classify fire ...

Rough set theory and discriminant analysis to classify financial data

Using color histograms and SPA-LDA to classify ...

Using Closed Captions and Visual Features to Classify ... - CiteSeerX

Using image processing to detect and classify narrow-band cricket ...

Using Neuro-Fuzzy Technique to Classify and Predict

Using a Discriminant Analysis to Classify Urban and Rural Climate ...

aspect to classify ... - Core

Using raster and vector data to identify objects for classify in flood risk ...

Using a Data Quality Framework to Clean Data Extracted from the ...

RxNorm and NDF-RT - Semantic Scholar

Performance of LAPAN-A2 satellite data to classify ...

Assessment of quality of input data used to classify ecosystems ...

Clustering With GIS: An Attempt to Classify Turkish District Data

Combining one-class classifiers to classify missing data - CiteSeerX