thus, perform all required actions to convert, aggregate, or anonymize data of the .... presented as web pages (HTML / XHTML) on different devices using built-in.
Proceedings of the First International DELOS Conference, Tirrenia, Italy, 2007/2 (c) Springer Berlin / Heidelberg: Lecture Notes in Computer Science 4877/2007
Management of and Access to Virtual Electronic Health Records M. Springmann1 , L. Bischofs2 , P. M. Fischer3 , H.-J. Schek4 and H. Schuldt1 , U. Steffens2 , and R. Vogl5 1
Database and Information Systems Group, University of Basel, Switzerland 2 OFFIS Oldenburg, Germany 3 Institute of Information Systems, ETH Zurich, Switzerland 4 University of Konstanz, Germany 5 HITT - health information technologies tirol, Innsbruck, Austria
Abstract. Digital Libraries (DLs) in eHealth are composed of electronic artefacts that are generated and owned by different healthcare providers. A major characteristic of eHealth DLs is that information is under the control of the organisation where data has been produced. The electronic health record (EHR) of patients therefore consists of a set of distributed artefacts and cannot be materialised for organisational reasons. Rather, the EHR is a virtual entity. The virtual integration of an EHR is done by encompassing services provided by specialised application systems into processes. This paper reports, from an application point of view, on national and European attempts to standardise electronic health EHR. From a technical perspective, the paper addresses how services can be made available in a distributed way, how distributed P2P infrastructures for the management of EHRs can be evaluated, and how novel contentbased access can be provided for multimedia EHRs.
1
Introduction
eHealth Digital Libraries contain electronic artefacts that are generated by different healthcare providers. An important observation is that this information is not stored at one central instance but under the control of the organization where data has been produced. The electronic health record (EHR) of patients therefore consists of a set of distributed artefacts and cannot be materialized for organizational reasons. Rather, the EHR is a virtual entity and has to be generated by composing the required artefacts each time it is accessed. The virtual integration of an EHR is done by encompassing services provided by specialized application systems into processes. A process to access a virtual EHR encompasses all the services needed to locate the different artefacts, to make data from the different healthcare providers available, to perform the format conversations needed, and to present the result to a user. This requires an infrastructure that is highly dependable and reliable. Physicians must be given the guarantee that the system is always available. Moreover, the infrastructure has to allow for the transparent access to distributed data, to provide a high degree of scalability,
2
Springmann, Bischofs, Fischer, Schek, Schuldt, Steffens, Vogl
and to efficiently schedule the access to computationally intensive services by applying sophisticated load balancing strategies. Physicians need information immediately in order to make vital decisions. Long response times due to a high system load cannot be tolerated. An example is content-based similarity search across a potentially large set of documents which requires the availability of several building blocks, some of them being highly computationally intensive, which have to be combined to processes. This paper gives an overview on various aspects of management and access to virtual EHRs and research that have been carried out in this field within the last three years. The paper is organized as follows: Section 2 describes health@net, a novel national Austrian eHealth IT infrastructure which allows the access to health records across collaborating institutions. Section 3 explains how complex processes spanning over several institutions can be modelled and executed and thus, perform all required actions to convert, aggregate, or anonymize data of the virtual EHRs. Section 4 shows how similarity search technology can be used to enhance retrieval of EHRs. Since security and permissions are essential for such sensitive data as health records, Section 5 gives an outlook on how the behaviour of such distributed peer-to-peer can be simulated. Section 6 concludes.
2
health@net
Establishing EHRs in hospitals has been the central issue for hospital IT management in the past more than ten years. In the industrialized countries, a very good coverage with hospital IT systems for documentation of patient related data and their integration to form a common EHR has now been reached (for the European situation, see the HINE study 2004, as presented in the proceedings of the EU high level conference eHealth2006: Rising to the challenges of eHealth across Europe’s regions). Currently, the main issue for the European countries, as called for by the European eHealth action plan of 2004, is the establishment of national EHR systems that allow for the standardized exchange of electronic health data between healthcare providers with the vision of European interoperability. In view of this, the Austrian Ministry of Health (MoH) has initiated the Austrian eHealth Initiative in early 2005 – a stakeholders group which delivered the Austrian eHealth strategy to the Austrian Ministry of Health at the end of 2005. The health@net project [1] did very actively and influentially contribute to the formulation of the strategy and to subsequent consultations with the MoH in the course of 2006, ensuring the elaborate Digital Library ideas for distributed archive systems and high level security mechanisms were maintained in that strategy and implemented in a subsequent feasibility study commissioned by the MoH. The health@net team has developed a reference implementation [2] of its distributed secure architecture (see Figure 1) for shared EHRs with special focus on secure web services with service and user role based access control (RBAC) relying on the SECTET toolkit developed by the project partners at the Institute for Software Engineering at Innsbruck University. Pilot projects utilizing this architecture for topical EHR applications (e. g., online access to health
Management of and Access to Virtual Electronic Health Records
3
Fig. 1. The health@net Architecture and its Utlization for Access to Multimedia Documents Across Healthcare Organizations. The Functional Components are realized as Web Services (AN: Access Node; DMDI: Distributed Meta Data Index; DR: Document Repository; PL: Ratient Lookup; PLI: Patient Lookup Index; GI: Global Index; PEP: Policy Enforcement Point).
record data between the hospital trusts in Vienna (KAV) and Tirol (TILAK); a haemophilia register for quality insurance and epidemiological studies; an autistic children therapy register; a portal for report access by a group of general practitioners) have been commissioned and are currently in implementation. A certification of the compliance with IHE (integrating the Healthcare Enterprise) has been prepared.
3
XL System
XL [3] is a system which provides an environment for definition, composition, and deployment of Web services on a set of different platforms. XL Web services are expressed by the XL language. XL is specifically designed for processing XML data as the most common data representation within the Web service domain. The XL language uses the XQuery/XSLT 2.0 types throughout the whole system, and further uses XQuery expressions (including XML Updates) for processing XML. XML message passing in XL is based on the XML SOAP (Web Services) standard. A full implementation of XL has been available since 2002. There is also comprehensive tool support: There is an Eclipse Plug-in for XL which includes syntax-driven editing, debugging, and simple deployment. XL is deployable on any Platform supporting at least Java 1.3, including application servers and gateways to existing systems, personal computers of physicians and scientific personnel as well as patients, and even embedded devices like PDAs and mobile phones.
4
Springmann, Bischofs, Fischer, Schek, Schuldt, Steffens, Vogl
Virtual EHRs need to deal with distributed artefacts of medical data (from different healthcare providers, health insurances, public authorities and the patients themselves) that cannot be materialized due to organizational reasons or privacy concerns. For this reason, it is necessary to perform tasks as: (1) locate the different artefacts, (2) make data from the different healthcare providers available, (3) perform the format conversions needed, (4) aggregate/anonymize the data, and (5) finally present the result to a user. XL can contribute solutions to those requirements, if the system is realized in the context of web services. The strengths of XL are in the areas of high-level format conversions, aggregation/ anonymization and presentation of results. Locating services that provide relevant artifacts is possible using either UDDI or more content-oriented services directories. XL might also be used to make data from the individual providers available as web services, but needs to be complemented by additional software for low-level format conversion and feature extraction. Results can be presented as web pages (HTML / XHTML) on different devices using built-in components. Orchestration and composition of services (e.g., running workflows defined by BPEL) allows for quick and easy development of new services based on open standards. As a demonstrator for the usage of XL in the context of Virtual EHRs, the rapid development of web service to integrate some medical data will be shown. The service demonstrated here collects information about the patient age for cases of a given illness from two health providers which make that data available as web services. The data transfer format and the call parameters are different, however. The service retrieves all the necessary information from all the services and transforms it into a common result document that only contains the age information for each cases (presumably for reasons of anonymity). To keep the demonstrator short and concise, no advanced features of web service interaction and data manipulation are shown here. They can be added to the example easily when required. The development is done using the XL platform and the development environment XLipse, as shown in Figure 2. To start developing, a new XL Project needs to be created, following the usual Eclipse approach. To this project, a file with the name find illness.xl is added. The relevant service and operation signature are added. Inside the operation, the variable for the case list is initialized as an XML fragment. The first web service (a hospital) is called; this one supports a parameter to specify the illness. The result is stored in the variable $hospital 1. The age information for each of the hospitalizations in that variable are now added to the case list. The second provider only allows querying all cases after a given date, so all cases after a date in the past a retrieved into $healthP rovider 2. Since this provider also does not have a strict schema, we retrieve all reports off illnesses (regardless of their position in the document) and keep those that have a diagnosis corresponding to the requested illness. The age information for those cases is then added to the case list. Finally, the case list is returned.
Management of and Access to Virtual Electronic Health Records
5
Fig. 2. Screen-shot of the XL system
4
Similarity Search for EHRs
Electronic health records have become a very important patient-centred entity in health care management. They represent complex documents that comprise a wide diversity of patient-related information, like basic administrative data (patient’s name, address, date of birth, name of physicians, hospitals, etc.), billing details (name of insurance, billing codes, etc.), and a variety of medical information (symptoms, diagnoses, treatment, medication, X-ray images, etc.). Each item is of a dedicated media type, like structured alphanumeric data (e.g., patient name, address, billing information, laboratory values, documentation codes, etc.), semi-structured text (symptoms description, physician’s notes, etc.), images (X-ray, CT, MRT, images for documentation in dermatology, etc.), video (endoscopy, sonography, etc.), time series (cardiograms, EEG, respiration, pulse rate, blood pressure, etc.), and possibly others, like DNA sequence data. Stateof-the-art medical information systems aggregate massive amounts of data in EHRs. Its exploitation, however, is mainly restricted to accounting and billing purposes, leaving aside most medical information. In particular, similarity search based on complex document matching is out of scope for today’s medical information systems. We believe that efficient and effective matching of patient records forms the rewarding foundation for a large application variety. Data mining applications that rest upon our patient record matching approach will foster clinical research and enhanced therapy, in general. Specifically, our approach allows for effective comparison of similar cases to (1) shorten necessary treatments, (2) improve diagnoses, and (3) ensure the proper medication to improve patients’ well-being and reduce cost. This can be achieved, by no longer resting entirely on exact matching, but employing the paradigm of similarity search to the content of EHRs. In particular, health records at present contain a vast number of medical images, stored in electronic form, that captures information about the patients
6
Springmann, Bischofs, Fischer, Schek, Schuldt, Steffens, Vogl
Fig. 3. ISIS Demonstrator
health and progress of treatments. This information is only used indirectly for retrieval in the form of the metadata stored in the DICOM header in PACS system or the diagnosis of a radiologist. State-of-the-art image retrieval technology can assist the retrieval by using the medical images for search in addition to traditional search approaches. One example of such a retrieval system is ISIS(Interactive SImilarity Search), which is a prototype application for information retrieval in multimedia collections built at ETH Z¨ urich [4] and has been extended at UMIT and the University of Basel. It supports content-based retrieval of images, audio and video content, and the combination of any of these media types with sophisticated text retrieval [5]. A set with more than 50,000 medical images, which sum up to roughly 4.65 gigabytes of data, has been inserted and indexed with the ISIS system. The files originate from four different collections. All of them have been used for the Medical Image Retrieval Challenge for the Cross Language Evaluation Forum 2005 (ImageCLEFmed 2005, http://ir.ohsu.edu/image/ [6]). The results of our experiments as well as the overall experience described for ImageCLEFmed is that the combined retrieval using visual and textual information improves the retrieval quality compared to queries using only one source of information, either visual or textual. A variation of this setting is the automatic classification of medical images. This approach is also addressed as a task in ImageCLEF and one goal in this approach in clinical practice is, to be able check the plausibility e.g., of DICOM header information and other kind of automatically or manually entered metadata in order to reduce errors. The recent activities performed at University o
Management of and Access to Virtual Electronic Health Records
7
Basel in this area focused on the improvement of the retrieval quality and also the retrieval time for such classification tasks using the ImageCLEF benchmark. As shown in [7], a long running classification task could be speed up by factors in the range of 3.5 to 4.9 using an early termination strategy and by using parallel execution on 8 CPUs even to a total speed up of 37.34.
5
RealPeer Simulation Environment
eHealth Digital Libraries connect a number of different collaborating institutions which store different parts of virtual EHR. Which part of a record is stored by which institution is determined by the institutions’ role concerning their patients’ medical care. From a Digital Library point of view this calls for a flexible support by a system architecture which enables the combination of distributed artefacts against the background of different organisational and topical contexts, offering simple and fast access to physicians in charge on the one hand and on the other hand guaranteeing autonomy and, in particular, confidentiality to institutions producing and owning specific health record content. Bischofs and Steffens [8] introduces a new approach for a class of peer-topeer systems based on combined overlay networks, which are able to map organisational structures to peer-to-peer systems and which are therefore especially suitable for the problem at hand. However, the evaluation of such new, not yet existing systems is difficult. One approach for analysing the behaviour of such systems is simulation. For this purpose, the time discrete, event-based peer-to-peer simulation tool eaSim [9] and the RealPeer Framework [10] have been developed. eaSim allows the evaluation of peer-to-peer search methods and RealPeer supports the simulation-based development of peer-to-peer systems. Currently, we are developing the new integrated simulation environment RealPeer SE, which joins the functionality of eaSim and RealPeer. Additionally, RealPeer SE resolves some drawbacks of the eaSim simulation model and allows the simulation-based development of peer-to-peer systems with combined overlays. Unlike other peer-to-peer simulators which primarily focus on concrete systems, RealPeer SE follows a generic approach considering different overlay topologies and search methods. The graphical user interface of RealPeer SE (see Figure 4) assists the user throughout the simulation process. The overlays of the simulation model are visualized by using the Java Universal Network/Graph Framework (JUNG). Predefined metrics help understanding the dynamic behaviour and efficiency of the simulated peer-to-peer system. These metrics are visualized by using the JFreeChart library. Figure 5 contains an example of a simulation result demonstrating four different metrics concerning a search in a peer-to-peer system with the following combined overlays. The first overlay is based on Chord [11], which builds a ring topology of peers and allows inter-organisational queries. The second overlay contains the relationships within organisational units (e.g., institutions) to allow an intra-organisational breadth-first search. Both overlays are interconnected through the physical network so that messages can be routed
8
Springmann, Bischofs, Fischer, Schek, Schuldt, Steffens, Vogl
Fig. 4. Graphical user interface of the RealPeer Simulation Environment
from one overlay to another. In the simulation experiment described in this paper, 100 search queries have been started from a random node at a random point in time in a peer-to-peer system containing 250 nodes. The queries search for artefacts of type File (e.g., EHR) within a specific organisational unit (e.g., an institution). Therefore the query message is first routed within the Chord overlay to find the organisational unit. Then, the query is routed within the organisational unit to find the artefacts. Figure 5(a) depicts the number of messages in the peer-to-peer system at simulation time. In Figure 5(b) we can see that all query messages have been answered. The load of particular nodes is shown in Figure 5(c). Figure 5(d) depicts the number of query hops until an answer is received. The described simulation experiment demonstrates the abilities of RealPeer SE. However, for more meaningful simulation results, we need experiment settings based on empirical data. Therefore, our future work includes not only the further development of the RealPeer SE but also the planning, execution and analysis of simulation experiments. After that, RealPeer SE can be used for the simulation-based development of an executable system prototype.
6
Conclusions and Outlook
Recent developments, for instance in medical imaging, have led to significant improvements in data acquisition large increases in quantity (but also in quality) of medical data. Similarly, in health monitoring, the proliferation of sensor technology has strongly facilitated the continuous generation of vast amounts of physiological data of patients. In addition to this information, more traditional
Management of and Access to Virtual Electronic Health Records Number of Messages
9
Answered Queries Relation 1,05
90
1,00
85
0,95
80
0,90 0,85
75
0,80 0,75
65
Answered Queries/Queries
Number of Messages in System
70
60 55 50 45 40 35 30
0,70 0,65 0,60 0,55 0,50 0,45 0,40 0,35 0,30
25
0,25 20
0,20
15
0,15
10
0,10
5
0,05
0
0,00 0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
0
5
10
15
20
25
30
35
40
45
Simulation Time Number of Messages
Number of Response Messages
55
60
65
70
75
80
85
90
95
100
105
Answered Queries Relation
(a) Number of Messages in System
(b) Answered Queries Relation
Messages per Node
Hops to Answer 260
13
240
12
220
11
200
Number of Answers
10
Number of Nodes
50
Simulation Time
Number of Query Messages
9 8 7 6 5 4
180 160 140 120 100 80 60
3
40
2
20
1 0
0 -10
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
Number of Messages
(c) Messages per Node
190
200
210
220
230
-2
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
Number of Hops
(d) Hops to Answer
Fig. 5. Simulation results of an experiment with 100 search queries in a peer-to-peer system with 250 nodes
structured data on patients and treatments needs to be managed. In most cases, data is owned by the healthcare organization where it has been created and thus needs to be stored under the organization’s control. All this information is part of the EHRs of a patient, thus is included in eHealth Digital Libraries. A major characteristic of eHealth DLs is that information is under the control of the organization where data has been produced. The EHR patients therefore consists of a set of distributed artefacts and cannot be materialized for organizational reasons. Rather, the electronic patient record is a virtual entity. The virtual integration of an electronic patient record is done by encompassing services provided by specialized application systems into processes. In this paper, we have addressed several major issues in the management of and access to virtual EHRs. First, this includes platforms for making data and services available in a distributed environment while taking security and privacy into account. Second, it addresses support for efficient and effective search in these virtual electronic health records. The concrete activities that are reported in this paper have been subject to joint work within the task Management of and Access to Virtual Electronic Health Records of the DELOS network of excellence.
References 1. Schabetsberger, T., Ammenwerth, E., Breu, R., Hoerbst, A., Goebel, G., Penz, R., Schindelwig, K., Toth, H., Vogl, R., Wozak, F.: E-health approach to link-up actors in the health care system of austria. Stud Health Technol Inform 124 (2006) 2. Vogl, R., Wozak, F., Breu, M., Penz, R., Schabetsberger, T .and Wurz, M.: Architecture for a distributed national electronic health record system in austria. In: In Proceedings of EuroPACS 2006, Trondheim (Norway), 15-17 June 2006.
10
Springmann, Bischofs, Fischer, Schek, Schuldt, Steffens, Vogl
3. Florescu, D., Gr¨ unhagen, A., Kossmann, D.: Xl: a platform for web services. In: CIDR. (2003) 4. Mlivoncic, M., Schuler, C., T¨ urker, C.: Hyperdatabase infrastructure for management and search of multimedia collections. In: Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation, Pre-proceedings of the Sixth Thematic Workshop of the EU Network of Excellence DELOS, S. Margherita di Pula, Cagliari,Italy, 24-25 June, 2004. (2004) 25–36 5. Springmann, M.: A novel approach for compound document matching. Bulletin of the IEEE Technical Committee on Digital Libraries (TCDL) 2(2) (2006) 6. Clough, P., M¨ uller, H., Deselaers, T., Grubinger, M., Lehmann, T.M., Jensen, J.R., Hersh, W.R.: The clef 2005 cross-language image retrieval track. In: Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers. Volume 4022 of Lecture Notes in Computer Science. (2005) 535–557 7. Springmann, M., Schuldt, H.: Speeding up idm without degradation of retrieval quality. In Nardi, A., Peters, C., eds.: Working Notes of the CLEF Workshop 2007. (September 2007) 8. Bischofs, L., Steffens, U.: Organisation-oriented Super-Peer Networks for Digital Libraries. In Agosti, M., Schek, H.J., T¨ urker, C., eds.: Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures: 6th Thematic Workshop of the EU Network of Excellence DELOS, S. Margherita di Pula, Cagliari, Italy, 2425 June, 2004. Revised Selected Papers. Volume 3664 / 2005 of Lecture Notes in Computer Science., Springer (2005) 45–62 9. Bischofs, L., Giesecke, S., Gottschalk, M., Hasselbring, W., Warns, T., Willer, S.: Comparative evaluation of dependability characteristics for peer-to-peer architectural styles by simulation. Journal of Systems and Software (02 2006) 10. Hildebrandt, D., Bischofs, L., Hasselbring, W.: Realpeer - a framework for simulation-based development of peer-to-peer systems. In: Proceedings of the 15th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2007), Naples, Italy, 7.-9. Februar, Los Alamitos, CA, USA, IEEE (2007) 490–497 11. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord - a scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, San Diego, California, USA, ACM Press (2001) 149–160