Extensible Markup Language (XML) in Health Care: Integration of Structured Reporting and Decision Support Charles E. Kahn, Jr., M.D., Norberto B. de la Cruz, Ph.D. Office of Clinical Informatics, Medical College of Wisconsin, Milwaukee, Wisconsin
The Extensible Markup Language (XML) was devised to provide SGML's extensibility, structure, and data-checking to create for robust, large-scale Web applications [7]. As with SGML, XML is a meta-language. Unlike the fixed format of HTML, XML allows authors to define their own elements to represent database schemas or object-oricnted hierarchies. XML allows client applications to check data for structural validity. XML omits the more complex and less-used parts of SGML, such as the SGML declaration and tag minimization [8]. Because XML is a subset of SGML, existing SGML parsers and tools can be applied to XML documents. The World Wide Web Consortium (W3C) recently adopted XML as a standard [9,10] and several major software vendors (including Microsoft, Sun Microsystems, Netscape, Adobe, and IBM) support XML [11]. Efforts also are underway to incorporate XML into widely used health care standards such as HL7 [12].
Abstract The Extensible Markup Language (XML) is a newly adopted Internet protocol for data interchange designed to bring the key features of the Standard Generalized Markup Language (SGML; ISO 8879:1986) - extensibility, complex structures, and validation - to the World Wide Web. In this paper, we describe an architecture that uses XML to mediate between disparate client-server systems for structured reporting and decision support.
Keywords: XML; SGML; Standards; Structured Reporting; Decision Support; Computer-based Patient Record
INTRODUCTION
World Wide Web: from HTML to XML The World Wide Web underlies a growing number of systems for medical records, knowledge sources, decision support, and education [1,2], and is poised to transform the nature of medical practice [3]. Current Web-based applications depend primarily on the Hypertext Markup Language (HTML). HTML consists of a simple, fixed set of elements - indicated by anglc-brackectd "tags" (e.g., ) - that describe how documents should be displayed by Web client (browser) programs. It defines a single, fixed type of document with markup that describes headings, paragraphs, lists, illustrations, and some provision for hypertext and multimedia.
An XML document's structure can be defined in a Document Type Definition (DTD) included within the document or referenced from an external source. The DTD specifies the names of the document's allowable elements, how often an element may appear, and the order in which elements appear. Elements consist of an opening tag (e.g., ) the element's contents, and a closing tag (). For "empty" elements that contain no other elements, the closing tag can be omitted by including a forward-slash character at the end of the opening tag (e.g., ). Attributes that further describe the element can appear within the opening tag's angle brackets.
HTML is a specific application of the Standard Generalized Markup Language (SGML), the international standard (ISO 8879:1986) for defining the structure and content of different types of electronic documents [4-6]. SGML is a metalanguage: it allows authors to create markup languages for specific tasks. Although SGML is used widely in industry, government, and academe, its complexity makes it ill suited for use on the Web.
1091-8280/98/$5.00 © 1998 AMIA, Inc.
We explored the use of the Extensible Markup Language (XML) to mediate between components of the computer-based patient record (CPR). We sought to integrate existing Web-based systems for structured reporting (SPIDER) and probabilistic decision support (BANTER).
725
METHODS
Structured Reporting SPIDER (Structured Platform-Independent Data Entry and Reporting) uses open information standards to achieve platform-independent entry of structured reports [13,14]. SPIDER can accommodate a variety of reporting applications that contain hierarchically organized concepts [15]. Reporting concepts can be linked to external vocabularies such as the Unified Medical Language System (UMLS) Metathesaurus [16,17].
From an existing BN for diagnosis of acute abdominal pain [21], the current model included the patient's age and sex, five patient history questions, and five physical examination findings (Table 1). To simplify the model, we excluded nodes related to imaging findings. The diagnoses of interest were gallstones, cholecystitis, appendicitis, gastroenteritis, and small bowel obstruction. The patient's age and sex influenced the presence of gallstones, which in turn influenced the presence of cholecystitis. All of the diagnoses other than gallstones influenced the various symptoms and signs.
SPIDER presents the reporting concepts as familiar graphical objects such as text windows, checkboxes, and radio buttons in Web data-entry forms. The hierarchy of concepts is preserved; the client program can display the form elements with appropriate levels of indentation. From the data entered through the Web interface, SPIDER can create textual reports or XML documents. The XML documents include a report-specific DTD that defines the allowable data fields and values; the resulting report is thus both portable and self-defining [13,14]. SPIDER currently runs on a Sun Netra i5 Internet Server (Sun Microsystems, Palo Alto, CA) and the Netscape Enterprise Server 3.0 (Netscape Communications, Mountain View, CA). SPIDER's software is written in the PerI programming language (version 5.0).
Table 1 - Data elements used by the decision support system.
Physical Finding
Decision Support BANTER (Bayesian Network Tutoring and Explanation) provides decision support using Bayesian networks (BNs) as its knowledge model [18,19]. BNs represent probabilistic knowledge graphically [20]. Each node in the BN graph represents a stochastic variable; the probability values of the node's two or more possible states (e.g., "present" and "absent") sum to 1. Arcs between variables represent influence, expressed as conditional probabilities.
Diarrhea Obstipation Similar prior
Cool 1991 C0221152 PRIOR-
Present; Absent Present; Absent Present; Absent
symptoms Rigidity
SX C0005903 C0238547 C0159060
Present; Absent Present; Absent Present; Absent
C0234246
Present; Absent
C0240877
Present; Absent
C0005903 C0427512
:0
Demography
.. ;
''S
years
Age
,
;o
Sex
.§.,,.y
|AbserntpU^known FAnorelra . . . . . . . ._. . . . . _. F
Figure I XML document defines data elements for reporting application and decision support model.
|
~
~
rTh
-
ObstipationC|#
g
_ vfY .::-;;
. . . iL
~
S
e=.,S
_
.
Sizulspnior symptoms.
.....
". ......
-1
RESULTS
r ....... I...".-
Figure 2
Of the 27 concepts in the model (including categories such as "Patient History"), only one could not be matched to a UMLS Metathesaurus concept: we assigned the ad-hoc identifier "PRIOR-SX" to the term "Similar prior symptoms." Otherwise, the concept identifier (ID) attribute in the reportdefinition document (Figure 1) corresponded to the concept unique identifier (CUI) in the UMLS Metathesaurus.
-
.. ---
m :-
.m
ikd
-I.:"......
...: ........
Web-based data-entry interface.
The decision support module used the XML-based structured report to instantiate the Bayesian network's nodes and evaluate the probabilities of the diagnoses. From the data, BANTER correctly interpreted the values, instantiated the model, and calculated the diseases' probabilities. The output was displayed through the Web interface.
727
":;.
I.
Acute-Abdominal- Pain Demography Age Sex Male Female History
Anorexia Vomiting Diarrhea Obstipation
Present Absent
(Demography?, History?, Physical-Findings?) (Age?, Sex?)
(#PCDATA) (Male
Female)
() ()
(Anorexia?, Vomiting?, Diarrhea?, Obstipation?, Similar-prior-symptoms?) (Present Absent)
(Present (Present (Present Similar-prior-symptoms (Present
Absent) Absent) Absent) Absent)
() ()
39
< /Diarrhea>
Figure 3 - Part of the XML document type definition and document instance generated by SPIDER. DISCUSSION
data manipulation capabilities into Web clients [7]. XML documents can serve as the objects that are operated upon by such distributed software components. A prototype version of SPIDER used Java applets for structured data entry based on SGML [23]. Java programs and applets also have been developed for Bayesian networks. The system described here can be extended to incorporate clientside Java software for data entry and Bayesiannetwork inference. XML has the potential to facilitate the integration of data entry, decision support, and other components of the evolving computer-based patient record.
The suite of tools described here offers a means to create and implement structured reporting applications that use the open technologies of XML and the World Wide Web, and to integrate those reports with interactive decision support. XML enables the delivery of self-describing data structures of arbitrary depth and complexity. XML retains SGML's key advantages of extensibility, structure, and validation, but is significantly easier to learn, use, and implement than full SGML.
The current system employs server-based software for structured data entry, report generation, and computation (inference) of diagnostic probabilities. The use of XML is particularly attractive in light of the Web's ability to incorporate distributed software modules, such as Java applets, that embed powerful
ACKNOWLEDGMENTS Supported in part by a Biomedical Engineering Research Grant from The Whitaker Foundation (C.E.K.). The authors thank Ondrej Zoltan for
728
[13]
References
[14]
[1] Lowe HJ, Lomax EC, Polonkey SE. The World
[2]
[31 [4]
[5] [6]
[7]
[8]
[91
. Kahn CE Jr. Self-documenting structured reports using open information standards. Medinfo 1998 (in press). Kahn CE Jr. SGML for self-defining structured reports. Int JMed Informatics 1998 (in press). Kahn CE Jr. A generalized language for platform-independent structured reporting. Methods Inf Med 1997; 36:163-171. Lindberg DAB, Humphreys BL, McCray AT. The Unified Medical Language System. Methods Inf Med 1993; 32:281-291. Humphreys BL, Lindberg DAB, Schoolman HM, Barnett GO. The Unified Medical Language System: an informatics research collaboration. J Am Med Informatics Assoc 1998; 5:1-11. Haddawy P, Jacobson J, Kahn CE Jr. Generating explanations and tutorial problems from Bayesian networks. Proc AMIA Annu Fall Symp 1994; :770-774. Haddawy P, Jacobson J, Kahn CE Jr. BANTER: a Bayesian network tutoring shell. Artif Intel Med 1997; 10:177-200. Jensen FV. An introduction to Bayesian New York: Springer Verlag, networks. January 1998.
computer systems support, Gary P. Barnas, MD, for critical review of the manuscript, and the National Library of Medicine for access to the UMLS Knowledge Sources Server.
[151
Wide Web: a review of an emerging Internetbased technology for the distribution of biomedical information. JAm Med Informatics Assoc 1996; 3:1-14. Cimino JJ. Beyond the superhighway: exploiting the Internet with medical informatics. JAm Med Informatics Assoc 1997; 4:279-284. Kassirer JP. The next transformation in the delivery of health care [editorial]. N Engl J Med 1995; 332:52-53. International Standards Organization. ISO 8879:1986 Information processing - Text and office systems - Standard Generalized Markup Geneva: International Language (SGML). Standards Organization, 1986. Goldfarb CF. The SGML Handbook. Oxford: Clarendon Press, 1990. van Herwijnen E. Practical SGML. (2nd ed.) Boston: Kluwer Academic Publishers, 1994. Bosak J. XML, Java, and the future of the Web. In: Connolly D, ed. XML: Principles, Sepastopol, CA: Tools, and Techniques. O'Reilly, 1997: 219-228. (Khare R, ed. The World Wide Web Journal; vol 2, no. 4). Clark J. Comparison of SGML and XML: World Wide Web Consortium Note. World Wide Web Consortium. 15 December 1997. . Connolly D, Bosak J. Extensible Markup World Wide Web Language (XML). 31 1997. October Consortium.
[16] [17]
[18]
[19] [20]
1996:208.
[21] Haddawy P, Kahn CE Jr, Butarbutar M. A Bayesian network model for radiological diagnosis and procedure selection: work-up of suspected gallbladder disease. Med Phys 1994; 21:1185-1192. [22] McCray AT, Razi AM, Bangalore AK, Browne AC, Stavri PZ. The UMLS Knowledge Source Server: a versatile Internet-based research tool. Proc AMIA Annu Fall Symp 1996; :164-168. [23] Huynh PN, Kahn CE Jr. Structured entry of medical data using SGML, Java, and the World Wide Web. In: AMIA 1996 Spring Congress Abstract Book. Bethesda, MD: American Medical Informatics Association, 1996: 100
. [10] Flynn P, Allen T, Borgman T, et al. Frequently
(abstract).
Asked Questions about the Extensible Markup Language (The XML FAQ) Version 1.3. 1 June 1998. . [11] EDI News. XML touted as cure for EDI ills. 1997. CommerceNet. 8 August
Contact information: E-mail:
[email protected] URL: http://www.mcw.edu/midas/spider/
.
[12] HL7 SGML/XML Special Interest Group. The Kona Proposal. Health Level Seven. 19
729