Data, Network, and Application: Technical Description of the ... - NCBI

4 downloads 182 Views 1MB Size Report
Data, Network, and Application: Technical Description of the Utah RODS ... Pittsburgh, PA; 2Biomedical Security Institute, University of Pittsburgh and 3Carnegie.
Data, Network, and Application: Technical Description of the Utah RODS Winter Olympic Biosurveillance System Fu-Chiang Tsui PhD' 2, Jeremy U. Espino MD"2, Michael M. Wagner MD, PhD"2' Per Gesteland MD4, Oleg Ivanov MD"2, Robert T. Olszewski PhD'2, Zhen Liu"2, Xiaoming Zeng MD" 2, Wendy Chapman PhD"12, Weng Keen Wong MS2'3, Andrew Moore PhD2' 'The RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA; 2Biomedical Security Institute, University of Pittsburgh and 3Carnegie Mellon University, Pittsburgh, PA; 4University of Utah, Salt Lake City, UT Given the post September 11h' climate of possible bioterrorist attacks and the high profile 2002 Winter Olympics in the Salt Lake City, Utah, we challenged ourselves to deploy a computer-based real-time automated biosurveillance system for Utah, the Utah Real-time Outbreak and Disease Surveillance system (Utah RODS), in six weeks using our existing Realtime Outbreak and Disease Surveillance (RODS) architecture. During the Olympics, Utah RODS received real-time HL- 7 admission messages from JO emergency departments and 20 walk-in clinics. It collected free-text chief complaints, categorized them into one of seven prodromes classes using natural language processing, and provided a web interface for real-time display of time series graphs, geographic information system output, outbreak algorithm alerts, and details of the cases. The system detected two possible outbreaks that were dismissed as the natural result of increasing rates of Influenza. Utah RODS allowed us to further understand the complexities underlying the rapid deployment of a RODS-like system. INTRODUCTION Bioterrorism attacks and threats have made timely detection of outbreaks increasingly critical. The economic cost of an outbreak, such as anthrax, is high without early intervention. l Real-time computer-based biosurveillance systems that monitor a large proportion of the population have the ability to provide public health officials the current "health status" of a region's population and potentially reduce the morbity, mortality and cost of an outbreak. President George W. Bush likened the Real-time Outbreak and Disease Surveillance (RODS) system developed at the University of Pittsburgh to a modem day DEW line 2 (Distant Early Warning line - A line of radar stations near the 70th parallel across the North American continent, maintained by the United States and Canada and intended to give advance warning of approaching enemy aircraft and missiles.) 3 International and national events, such as the Olympics, pose even greater challenges to public health surveillance due to their potential as bioterrorism targets. These events have increasingly

AMIA 2002 Annual Symposium Proceedings

been accompanied by computer-based biosurveillance systems 4'5 The basic components of a computer-based biosurveillance system include: data acquisition, secure networks, a database and applications. Computer-based biosurveillance is data driven, thus data acquisition is the crucial first step for a surveillance system. Data acquisition may occur in "batch mode", in which data are collected at predetermined times, or in "real time", in which data are transmitted immediately from data sources as they are generated. Data acquisition can also be manual or automatic. Automatic methods have the advantage of not introducing changes in the workflow of the healthcare staff. Secure networks ensure connectivity and data security between the data providers, surveillance system, and users. Applications include natural language processors (NLP), detection algorithms, notification systems, and a user interface. Geographic information systems (GIS) allow geospatial visualization of the data. Detection algorithms monitor spatio-temporal abnormalities in the data. Notification systems page or email users when detection algorithms signal a potential outbreak. Natural language processors classify cases into a prodrome category (ex. respiratory, diarrheal) based on free text descriptions of the patients. In this paper, we describe the data, network, and applications of Utah RODS used to remotely monitor the health status of Utah from the University of Pittsburgh during the 2002 Winter Olympics. DATA LEVEL Data Providers Utah RODS receives admission, discharge, and transfer (ADT) data from emergency departments and walk-in clinics from two health systems, Intermountain Heath Care (IHC) and University of Utah Health Sciences Center (UUHSC). There are 9 emergency departments and 19 acute care facilities at IHC and one emergency department and one acute care facility at UUHSC. Together these emergency rooms and walk-in clinics serve -70% of the population of Utah.

815

Data Acquisition Data acquisition involves several technical components-messaging protocol, data type, and data elements. For the messaging protocol, we used Health Level 7 (HL-7) 6, which is recommended as a public health messaging standard by the Centers for Disease Control in their National Electronic Disease Surveillance System architecture. 7 We collected ADT messages as our data type. The data elements we collected include medical record number, ADT type (admission or discharge), visit number, gender, age, home zip code, work zip code, and the free-text chief complaint. We employed a standard client-server method for receiving HL-7 data-HL-7 listeners and HL-7 parsers. The HL-7 listeners establish TCP/IP connections with IHC and UUHSC. The HL-7 parser uses regular expressions to parse each data segment in an HL-7 message. We stored the parsed ADT messages into an Oracle8i database for data retrieval and analysis. Figure 1 shows a sample HL-7 message from one of health systems. 1ADTMSHI|-\&IHELPIxxxICOMMONIEXTERNAL12002022417151 A0412002022XXXXXXXXIP12 .3 PIDI| 11234567891111 |0201MI| ...842o41 11 PV1| |E|II II II II II II 198765432 11111111111111111111I IIII|| I1200202XXXXXXXI DG11 1SORE THROAT,COUGH IN 1 | ^AA A8 |I |I |I 11|IIIII

Because of concerns about possible communications failure, SMS offered to set up leased lines between the entities during the Olympics and the Para Olympics (February and March). The leased lines consist of pairs of 128k fractional TI lines attached to pairs of Cisco 2600 routers in fail-over mode at each endpoint. The leased lines were configured in a star configuration with Siemens network operations at the center. This configuration allowed for continuous monitoring of the communications links. To address concerns about misuse of the surveillance network (ex. accessing data to determine the market share of a rival healthcare competitor), the firewalls of the VPN and leased-line hardware prevented SMS, IHC and UUHSC from connecting to each other. The internal RODS network consists of a private class B network operating over switched 100baseT Ethernet. "..nh 8:

C.M. ,

..........z._.....Xe

VP F

i.

S.H..I

.

LeasLed Liao Ro.*'

A

L

4056

H*lh@"I

119.

'01Zx

----------

d

Figure 1: Sample HL-7 ADT message Utah RODS has methods to improve data quality. Utah RODS rejects duplicate messages or messages from ADT scheduling systems. To prevent hospitals from sending duplicate records for the same patient visit we use primary keys at the database level to block duplicates. The primary keys for HL-7 ADT data contain sending facility, ADT message type, medical record number, patient class, and visit number. Utah RODS also filters out ADT messages that have an admitted date and time occurs in the future. This occurs when patient scheduling information appears in the HL-7 ADT data feed. With any system that operates remotely, there is the possibility of losing a data feed. Utah RODS monitors all data feeds to make sure that data is continuously being sent from health systems. If Utah RODS stops getting data from a HL-7 feed, it sends an alert to the administrator to ensure data integrity. NETWORK LEVEL The communications network between Utah RODS and the data providers consists of virtual private networks (VPN) and leased lines. Site-to-site IPSEC VPNs were initially established with the help of our industry partner, Siemens medical systems (SMS), before the start of the Winter Olympics. We utilized low cost Cisco PIX 501s for this purpose.

Loosed

----

l"_"*

U..=,., 1

Cleo Pt.lor_

NLP S.m.r

Figure 2: Utah RODS Network Architecture System Hardware Utah RODS processes run on dedicated servers-an Internet firewall, database, web server, a GIS server, and a natural language processing server. The Internet firewall is a Dell Poweredge 350 configured with a 850Mhz Pentium III processor with 256 megabytes of RAM running ipfilter (firewall), and ipnat (network port/address translator) on NetBSD 1.5.2. * The database server is a Sun Microsystems Enterprise 250 configured with two Ultrasparc II 400Mhz processors, 2 gigabytes of RAM and 36 gigabytes of mirrored hard drive space runmng Oracle 8.1.7 (database) on Solaris 8. * The web server is a Dell Poweredge 1550 configured with two 1Ghz processors, 1 gigabyte of RAM, and 36 gigabytes of RAID5 storage running Apache 1.3.22 (webserver), Jetty 3.1.3 (Java servlet engine), and JBoss 2.4.3 (Java application server) on Redhat Linux 7.1. * The GIS server is a Dell Poweredge 1550 configured with two 1 gigahertz Pentium III processors, 512 megabytes of RAM, and 18

816

gigabytes of RAID5 storage running Apache 1.3.19 (webserver), Tomcat 3.3 (Java servlet engine), ArcIMS 3.1 (internet GIS) and ArcSDE 8.1 (spatial database) on Windows 2000 Advanced Server. * The natural language processing server is a Apple Macintosh G4 configured with a 800Mhz Power PC G4 processor, 256 megabytes of ram, and 40 gigabytes of hard drive space. Backup is performed nightly on all machines using a Sun StoreEdge L9 Tape Autoloader attached to the database server and Veritas Net Backup software. APPLICATION LEVEL Natural Language Processors (NLPs): Utah RODS uses two NLP-based classifiers that utilize free-text chief complaints as input to classify patients into a prodrome category. These NLPs, Bigram 8 and PLUS I0, map a chief complaint into one of seven prodromes-respiratory, diarrheal, botulinic, viral, encephalitic, hemorrhagic, and rash. Bigram developed at the University of Pittsburgh is a simple NLP that computes prodrome probability based on pair of words in a free-text chief complaint. PLUS developed at the University of Utah similarly computes prodrome probability using a more sophisticated Bayesian network. Both natural language processing systems operate in real time. The PLUS system was originally developed as an off-line system, but we adapted it to perform real-time processing using client-server TCP/IP socket connections. Whenever a chief complaint is available for processing, the RODS server sends a message to PLUS on the NLP server and it returns the classification of the case based on the freetext chief complaint. Bigram runs as a local process on the RODS database server. Detection Algorithms: RODS employs two algorithms-recursive least square (RLS) adaptive filter and WSARE (What's Strange About Recent Events), to detect abnormalities. RLS, a dynamic autoregressive linear model, predicts current count of each prodrome within a region based on historical data and adjusts its model coefficients based on prediction errors "1 . An alert is triggered when the RLS finds the current count is greater than the 95% confidence interval of its predicted count. Instead of predicting aggregate counts in one dimension, WSARE performs heuristic search over combinations of features for significant increases over a given time period.'2 Such features include all aspects of recent patient records, including detailed and prodromal categories, demographics of patients, geographical information about patients and other information. The criteria for sending a WSARE alert is that there has been an increase in the number of patients with specific characteristics and the p value of the increase is less than or equal to 0.05.

RLS and WSARE operate as local processes on the RODS database server every four hours. Alert Management and Notification: When detection algorithms detect an abnormal event, RODS sends page and email notifications to an investigation group composed of several medical doctors, an epidemiologist, a bio-statistician at the Utah Department of Health (UDOH), and RODS administrators. The group examines the data including the cases to determine if the alert has any significance.

i~~~~~~~~~~~~~~~~~~~~~~~ ..~ ~ ~ ~ ~ ~ ~ ~~~~~~~......... .......

z~~~ ............................

I

f...........

__v

__

..........P;

Figure 3: Main screen of Utah RODS

...

.,.~~~~~~~~~~~~~~.

-

----

wvm

If.p n

-momopk"" yew-Mb" m

Figure 4: Epiplot screen of Utah RODS showing number of visits to emergency rooms and walk-in clinics from February 8th to February 24h during the 2002 Winter Olympics User Interface: The user interface for Utah RODS is provided thorough an encrypted, password protected website. The interface is organized as three screens Main, Epiplot, and Mapplot. The Main screen (Figure 3) simultaneously shows eight time series plots corresponding to the daily total visits and the 7 prodromes for the past week. The user can view these graphs by county for the whole state. The user uses the Epiplot screen to generate customized time series plots and to retrieve case details based on prodrome, region, start dates, end dates as shown in Figure 4. The Mapplot screen is an interface to ArcIMS, an Internet GIS server developed by Environmental Systems Research Institute, Inc. (ESRI). A user can use Mapplot to display the proportion of a particular prodrome out of all the acute healthcare visits within

817

the group used the web interface to acquire more data about the alert, assessed whether the alert had any significance, and emailed a summary of their investigations to the rest of the group. In each case the alert was attributed to the influenza season and not a bioterrorism attack. DISCUSSION Given the short two-week period of the 2002 Winter Olympics and, fortunately, no known outbreaks in the region, it is hard to evaluate the performance of every aspect of RODS. However, we did learn lessons from deploying a remote biosurveillance system within a six-week period. We found little difficulty in establishing HL-7 data feeds between the data providers and Utah RODS. The engineers at IHC and UU were already experienced with establishing connections with various parts of their healthcare information system. Based on the recent study on different surveillance system13, the central problem with most surveillance systems is the lack of HL-7 for the message protocol. The National Electronic Disease Surveillance System (NEDSS) standards, developed by the Centers for Disease Control and Prevention (CDC), emphasizes the use of standard communication protocols like HL7 and XML. After deploying Utah RODS, we strongly suggest the use of HL-7 as the messaging protocol for interfacing with healthcare systems. Another lesson that we learned was the value of industry partners. At the beginning of this project, the RODS laboratory did not have an experienced Cisco network engineer for the deployment. SMS allowed us to outsource all of our networking issues-leased lines, router configuration, and network monitoring. With their help, we spent a minimal amount of time (two days) to install pre-configured VPN and leasedline routers. ESRI is a GIS solutions provider and the developer the GIS system (ArcIMS) used by Utah RODS's web interface. Their assistance to our GIS team greatly reduced the time for implementation and their technical advice improved the performance of our GIS system. Our past experience of more than three years in building computer-based biosurveillance systems for western Pennsylvania proved to be invaluable in deploying Utah RODS. All of our original program code was easily configured and adapted. Our programmers even had time to rewrite most of the original code, written in Perl, into Java/J2EE, resulting in a dramatic performance increase of our system. We learned that some aspects of computer-based biosurveillance systems can affect accuracy and usability. We note that duplicates and schedule appointments were sent out from the data providers during the Olympics to our server possibly distorting the output of the system. Generally, data from

zip codes of the state of Utah. We represented zip codes with a higher proportion of cases with darker shades of gray. The GIS server also overlays state boundaries, county boundaries, water bodies, hospital locations, landmarks, streets, and highways with the public health data as shown in Figure 5. Epiplot and Mapplot can display case details based on the role of the user. For example, IHC users are only allowed to see case details about IHC patients. These case details include the medical record number, age, gender, home zip code, work zip code, and freetext chief complaint of the patient.

-:

...

.:.;

...

:.:

:!: :-j.jjm-j-j..!:.:.j.:-------------

!j.F

:,:.. ..:

... :....

..:

........ ......

....

...

Figure 5: Mapplot screen of Utah RODS showing spatial distribution of respiratory cases. RESULTS We started receiving real-time HL-7 ADT data from the data providers on Jan. 30, 2002. The daily average number of visits during the Olympics (February 8 to February 24, 2002) was approximately 2800 visits, as shown in Figure 4. A total of 31 users logged in 233 times to the user interface. They viewed the graphs on the Main screen 1244 times, generated 702 custom graphs with Epiplot, accessed Mapplot 511 times, and 12 users who had the required access rights requested 618 sets of detailed case data. During the first three days of data collection, we found that some data from the two health systems may potentially affect the data integrity of the surveillance One health system continuously sent system. triplicates with different message control ID in HL-7 to RODS for the same patient admission. The other health system sent outpatient appointment scheduling messages on the HL-7 ADT feed to RODS. These problems were addressed using the techniques described in the data level section of this paper. During the two-week period of Olympics, one RLS and one WSARE alert were fired. The RLS alert occurred because of a sudden increase in the number of "viral" cases in one county. (from two cases to seven cases) The WSARE found an abnormal increase of total visits in another. Within an hour after the investigation group received an alert, members of

818

hospitals may contain redundancies and noise that a surveillance system needs to reconcile and filter. When we first discovered the above problem, we asked the data provider to remove the duplicate HL-7 messages. They were not able to remove the duplicate messages. We worked around this problem by filtering records and discarding messages that were not unique after they were received and parsed, but the method we employed to accomplish this is reliant on a having a medical record number and visit number. Unfortunately, not every health system is willing to provide this type of patient sensitive information for a surveillance system. Thus, a future solution to this problem is to install a system within a healthcare system that filters out duplicates, deidentifies records, and allows a computer-based biosurveillance system to retrieve identifiers if the public health officials need to do so. An important usability feature of one version of RODS deployed in Pittsburgh is the ability to directly access the electronic medical record (EMR) from the RODS interface. The physicians and epidemiologists we have worked with in Pittsburgh report that this is one of the most important features in RODS for distinguishing between false alarms, natural outbreaks, and bioterrorist attacks. During our assessment, we decided that integrating the RODS user interface with the EMRs of IHC and UU would take more time than which we were allotted. We hope that in the future EMR systems will have methods that allow direct connectivity with computer-based biosurveillance interfaces. CONCLUSION The Utah RODS project demonstrates the feasibility of the rapid deployment of a real-time automated computer-based biosurveillance system.

00-0009 from the Agency for Healthcare Research and Quality. This paper was also supported by Cooperative and U90/CCU318753-01 Number Agreement UPO/CCU318753-02 from the Centers for Disease Control and Prevention (CDC). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC. REFERENCES 1. Kaufmann A, Meltzer M, Schmid G. The economic impact of a bioterrorist attack: Are prevention and postattack intervention programs justifiable? Emerging Infectious Diseases 1997;3(2):83-94. 2. Bush GW. President increases funding for bioterrorism by 319 percent [online] 2002 [cited 2002 March 6]. Available from: http://www.whitehouse.gov/news/releases/2002/02/20020 205-4.htm1. 3. Pickett J, editor. The american heritageg dictionary of the english language. 4th ed. ed. Boston: Houghton Mifflin Company; 2000. 4. Paulson T. Region alert to bioterror, but health-care system underfunded [online] 2001 [cited 2002 Available from: 03/06/2002]. http://seattlepi.nwsource.com/local/40829 bio29.shtml. 5. Pueschel M. Darpa system tracked inauguration for attack [online] 2001 [cited 2002 3/6/2002]. Available from: http://www.usmedicine.com/article.cfm?articlelD=172&is suelD=25. 6. HL7. Health level 7 [online] 2002 [cited 2002 3/6/2002]. Available from: http://www.hl7.orgl. 7. NEDSS systems architecture [online] 2001 [cited 2002 from: Available 3/6/2002].

http://www.cdc.gov/od/hissb/docs/NEDSSsysarch2.0.pdf. 8. Olszewski RT. Bayesian classification of triage diagnoses for the early detection of epidemics. Submitted: AMIA Fall Symposium; 2002. 9. Chapman W, Christensen L, Wagner M, Haug P, Ivanov 0, Dowling J, et al. Public health syndromic detection from free-text triage diagnoses: Evaluation of a medical language processing system before deployment in the winter olympics. Submitted: AMIA Fall Symposium; 2002. 10. Christensen L, Haug P, Fiszman M, Chapman W. Plus: A probabilistic language understanding system. Submitted: Proc. Assoc. for Comp. Ling.; 2002. 11. Orfanidis SJ. Optimum signal processing. 2nd ed. New York: McGraw-Hill; 1988. 12. Wong W, Moore AW, Cooper G, Wagner M. Rule-based anomaly pattern detection for detecting disease outbreaks: Carnegie Mellon University, School of Computer Science; 2002 Feb. Report No.: CMU-CS-02-106CMU-CS-02-106. 13. Lober W, Karras B, Wagner M, Overhage J, Davidson A, Fraser H, et al. Roundtable on bioterrorism detection: Information system-based surveillance. Journal of the American Medical Informatics Association 2002.

ACKNOWLEDGEMENTS University of Pittsburgh: Kimberlee Barnhart, Mary Cleatus Szczepaniak, Gregory Cooper, John Levanderl Hassan Karini, Aisha Mitchell, and Chris Jursa University of Utah Health Sciences Information Technology Services Interface and Network Ops: Jim Livingston, Mark Beekhuizen, Kris Lundell and Gary Vandertoolen Intermountain Health Care's Datagate and WAN teams: Mike Noble, Noel Santamore, Maria Wisneiwska Utah Department of Health and the County Health Departments of Davis, Morgan, Salt Lake, Summit, Utah, Wasatch and Weber: Robert Rolfs Special thanks to SMS for providing networking services: Dot Powers and Dennis Matyas Special thanks to ESRI for assisting with the GIS services: Bill Davenhall, Lori Shienvold This work was supported by grants GO8 LM06625-01, and T15 LM/DE07059 from the National Library of Medicine; Defense Advanced Research Projects Agency; contract 290-

819