Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005
994
A Multi-Source Information System via the Internet for End-Stage Renal Disease: Scalability and Data Quality Mohamed Ben Saïda, Loic Le Mignota, Claude Mugniera, Jean Baptiste Richarda, Christine Le Bihan-Benjamina, Jean-Philippe Jaisa, Didier Guillonb, Ana Simonetb, Michel Simonetb, Paul Landaisa a
Université Paris-Descartes ; Faculté de Médecine ; Assistance Publique-Hôpitaux de Paris ; EA222 ; Service de Biostatistique et d’Informatique Médicale, Hôpital Necker,149 rue de Sèvres 75743 Cedex 15 Paris - France b Université J. Fourier, TIMC-IMAG, Grenoble, France
Abstract A Multi-Source Information System (MSIS), has been designed for the Renal Epidemiology and Information Network (REIN) dedicated to End-Stage Renal Disease (ESRD). MSIS aims at providing reliable follow-up data for ESRD patients. It is based on an n-tier architecture, made out of a universal client, a dynamic Web server connected to a production database and to a data warehouse. MSIS is operational since 2002 and progressively deployed in 9 regions in France. It includes 11,500 patients. MSIS facilitates documenting medical events which occur during the course of ESRD patient’ health care and provides means to control the quality of each patient’s record and reconstruct the patient trajectory of care. Consolidated data are made available to a data warehouse and to a geographic information system for analysis and data representation in support of public-health decision making. Keywords: Scalability; Data quality; Dynamic Web server; n-tier architecture; Multi-Source Information System; Internet; End-Stage Renal Disease.
1. Introduction The lack of coordinated information about patients suffering from end-stage renal disease (ESRD) led a large panel of health care providers, decision makers, researchers and institution representatives to initiate the Renal Epidemiology and Information Network (REIN) [1]. REIN is organized, at national and regional levels, around a network of professionals involved in ESRD health care. A Multi-Source Information System (MSIS) [2] dedicated to collect continuous and exhaustive records of all ESRD cases and their clinical follow-up, was developed by Necker Hospital at University Paris-Descartes. MSIS collates in a standardized representation a minimal patient record elaborated by health professionals [3]. Progressive deployment in the regions since year 2002 allowed testing the MSIS performance, acceptance among users, workload impact and maintenance cost effectiveness. MSIS is operational via the Internet, in eight regions plus one virtual region devoted to follow paediatric cases.
Section 13: Public Health Informatics, Clinical Trials
Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005
995
Scalability remains a major issue in the design of the information system, especially during implementation and deployment phases. It remains a major issue as the system grows [4]. In a general definition, scalability is an overlap between structural scalability and load scalability [5]: “Structural scalability is the ability of a system to expand to a chosen dimension without major modifications to its architecture”. A system is thought of “as being structurally scalable if its implementation or standards do not impede the growth of the number of objects it encompasses, or at least will not do so within a chosen time frame”. “Load scalability is the ability of a system to perform gracefully as the offered traffic increases”. A system is said to have “load scalability if it has the ability to function gracefully, without undue delay and without unproductive resource consumption or resource contention at light, moderate, heavy loads while making good use of available resources” [5]. In a previous paper [3], seek of scalability was one of the MSIS’ aims. It influenced MSIS design, architecture and implementation. In the present paper we will focus on the efficiency of the technological choices made to support the organizational network of professionals at the regional level to build a reliable and methodological longitudinal resource of information for a nationwide cohort of ESRD patients. 2. Material and Methods Organizational support REIN national committee for guidance and follow-up involves several organizations: Société de Néphrologie, Société francophone de dialyse, INSERM, Paris Descartes University and Grenoble J. Fourier University, Agence de la Biomédecine, Caisse Nationale d’Assurance Maladie des Travailleurs Salariés (CNAMTS), Direction de l’Hospitalisation et de l’Organisation des Soins (DHOS), Institut de Veille Sanitaire (InVS) and representatives of patients associations. Regional committees involve nephrologists, decision makers, public health insurers, epidemiologists and patients associations. Each region elects a nephrologist as program coordinator. In each region, a public health and epidemiology department supports the professionals and decision makers by providing resources and expertise for methodology and epidemiology studies. A clinical research assistant performs the quality control at least once every year for every patient record. Architecture and technical support Architecture MSIS is based on an n-tier architecture interfaced with a light-weighted universal client. MSIS uses a secure connection via the Internet. The client connects to the middle tier that is in relation with several databases: an identification database, a production database, a data warehouse and a geographical information system (Figure 1). Security Authorized users validate a certificate issued by MSIS system to exchange SSL encrypted messages. Client messages are analyzed by the firewall [6], which proceeds at the networking level. The firewall translates public IP addresses into local ones in order to filter IP addresses and TCP port accesses. Allowed messages are sent to the proxy [7]: a software component that rewrites public URL addresses and controls their validity and access-authorization. The proxy makes access-control decisions, to and from the production zone. It communicates with the production zone through the firewall. Outgoing messages to the client follow a reverse path (Figure 2).
Section 13: Public Health Informatics, Clinical Trials
Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005
996
Figure 1: MSIS n-tier architecture. Universal light weighted clients connect from 8 administrative regions plus a “pediatrics virtual” region: Limousin (L), Languedoc-Roussillon (LR), ChampagneArdenne (CA), Provence-Alpes-Côte d’Azur (PACA), Centre (C), Ile-de-France (IDF), BasseNormandie (BN), Midi-Pyrénées (MP). The middleware uses free software (Linux™, Apache, Tomcat JSP/Servlet™ container, and Java™ programming language). It is organized into a dynamic web server tier, interfaced with the web client tier and into a business logic tier, interacting with the databases in the information system tier.
An intrusion detection system audits all the devices and analyzes connections according to an updated list of threats. Deployment At the client side, MSIS relies on existing local Internet networking facilities and on a widely spread computer configuration in medical settings: Pentium III processor computer or equivalent, 128 Megabytes of random access memory (RAM), 1024x768 pixels screen resolution, Acrobat Reader™ 4, a web browser allowing 128 character SSL encryption (Internet Explorer 6.0, Netscape 7.0, Safari 2.0, Mozilla FireFox 1.0). Maintenance and evolutions are made centrally which reduce deployment costs and delays. Use of the system in the regions The patient record is organized into three parts: • a medical history, aetiology of ESRD and comorbidity at start of replacement therapy, • a recent medical observation with information about access to the care facilities and to the national kidney-graft waiting list, • an update of the actual renal dialysis method and context of treatment. Admission, discharge and transfer event information are documented and updated annually on the anniversary of first ESRD treatment. A decease record file, including standard medical codification of the decease is documented when necessary. The coordinating nephrologists and the clinical research assistant relay local trainings and are the main interlocutors in daily use of the system. They also provide an exhaustive representation of dialysis units and request authorizations and profiles for the nephrologists in the region. MSIS access codes are delivered individually to the users. Exhaustiveness control Monitoring tools in MSIS provide means to control exhaustiveness on a monthly basis and on an annual basis for patients’ follow-up update. The demand of care requires information about patients living in a neighbouring region and treated in the region of interest.
Section 13: Public Health Informatics, Clinical Trials
Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005
997
Figure 2: MSIS security system organization. Incoming and outgoing traffic passes through the firewall. Controls are made at the networking level. The proxy rewrites URL addresses and controls URL access-authorizations to and from the production zone. An intrusion detection listener audits the traffic.
Nephrologists of the neighbouring region are given an access MSIS in order to complete the missing information. Quality control More than ten mandatory information items are requested to create a patient record in the MSIS production database. Randomized study focused on the quality of information stored using MSIS in comparison with the information in the patient’s hospital record. Appropriate correlation coefficients or kappa coefficients are calculated for quantitative and qualitative variables, respectively. The clinical research assistant with the support of the epidemiological department realizes this work. MSIS functionalities MSIS provides tools permitting users at the dialysis unit level, to access to reporting lists of descriptive statistics. At the region level, specific quality control functions are available to the coordinating nephrologists and to the clinical research assistant: they consist of recapitulative regional information and reminders for annual follow-up as well as a data extraction module. This latter is organized to facilitate answering questions in relation with the project: better knowledge of the demand of ESRD care and matching the offer of care to the demand. The demand of care is derived from the set of data of patients living in the region, whether or not treated in the region. The offer of ESRD care concerns the patients treated in the region during a period of time, whether or not residing in the region. Identification server A patient identification server is implemented to provide unique identifier for every patient in the system and to prevent from creating duplicates by detecting existing close spelling patient names. The patient unique identifier allows the link between the two kinds of information. The identification server algorithm focused on strings comparison between the user entry information and the information stored in MSIS database. The method is derived from Needleman and Wunsch (N&W) algorithm [8] searching for similarities in strings. A value of an “acceptable distance” between two strings is assigned. If the minimal computed distance is equal or lower than an acceptable distance, the information entered is considered as having a match in the system. The identification server complies with the French law about privacy. Necessary agreements of the Commission Nationale Informatique et Libertés (CNIL) were obtained. Section 13: Public Health Informatics, Clinical Trials
Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005
998
CNIL recommends separate transit over the network of explicit nominative data and of medical data. MSIS’ interface was adapted consequently: two separate requests-response processes connect to the identification server and to the production database. The results are displayed “simultaneously” on the same screen, the patient full name and identification information in one frame and the medical information in another frame. Since the identification server implementation in April 2004, workload generated by the management of duplicates, dropped significantly and the data quality of the system improved consequently. Data consolidation and periodic feed of the data warehouse The quality control process performed by the clinical research assistant provides consolidated information, which are then exported to the data warehouse. A geographical information system uses this information to generate dynamically graphical representations of the demand and offer of care. It is an aid for public health decision-making. 3. Results As of May 24th 2005, 11424 patient records, 898 transplantations and 2295 ESRD patient decease cases are documented in the production database. The active file includes 8227 patients who undergo dialysis (detailed information are presented in table 1). According to the national survey of prevalent ESRD dialyzed patients in June 2003 [9], MSIS active file includes 26 % of the nation wide ESRD dialyzed patients. MSIS and the organizational support MSIS access codes were provided to more than 300 nephrologists and 50 codes to their collaborators. Ten clinical research assistants and nine physicians perform the quality control at the regional level. Eight university departments of medical informatics, and/or of epidemiology and/or of public health insure the patient data quality control using MSIS. Table 1: MSIS deployment in the regions Region
Date of inclusion
Limousin Languedoc-Roussillon Champagne-Ardenne Centre Provence-Alpes-Côte-d’Azur Ile-de-France Midi-Pyrénées Basse Normandie Pediatrics virtual region
Jan 22nd 2002 Jun 2nd 2002 Jan 1st 2003 Jan 1st 2004 Jan 1st 2004 Nov 1st 2004 Feb 1st 2005 Feb 1st 2005 Jan 1st 2004
Population Number of cases in the active file 711 000 2 296 000 1 342 000 2 440 000 4 506 000 10 952 000 2 552 000 1 422 000 N/A
382 1,405 682 1,306 3,060 615 285 378 114
As longitudinal data accumulate, annual follow-up of the ESRD cohort are progressively and systematically taking place in the regions. 4. Discussion: scalability issues Adding components fits within the n-tier architecture either as a database server, business logic or technologies such as XML data processing. Only few environmental changes occurred during last 3 years: migration to Java 1.4, to Tomcat 5.0 and to MySQL 4.0. A reinforcement of the system security was an addition to MSIS rather than a rebuilt. Few changes of the data structure were necessary. This was the case when transactions were introduced into the system or when responding to the nephrologists’ demand of adding new Section 13: Public Health Informatics, Clinical Trials
Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005
999
items in the patient record. The data conceptual model remained unaffected. The implementation of the identification server fitted within an n-tier modular architecture. Additional feed-back functionalities, focused on information retrieval and quality control. Scalability issues evolve with the system use. While MSIS online use for three years and progressive deployment in the regions confirmed choices made at the design and implementation phases, new workload is identified and needs to be addressed. It points to the need of further resources to support the efforts of the clinical research assistants in guaranteeing quality of the patient information entered in the system. MSIS showed a major stability and good performance in extending its use to new regions. 5. Conclusion MSIS proved its usability and support to nephrologists and health care decision makers to follow a cohort of ESRD patients and to provide reliable longitudinal information for epidemiology and public health decision-making in the context of ESRD. 6. Acknowledgments This research was funded by a grant from STIC-Santé-Inserm 2002, n°A02126ds, by ParisDescartes University and by Assistance Publique-Hôpitaux de Paris. This work was also supported by grants provided by the Caisse Nationale d’Assurance Maladie des Travailleurs Salariés, the Institut de Veille Sanitaire and the Agence de la Biomédecine. The nephrologists in charge of the ESRD units of Limousin, Languedoc-Roussillon, Champagne Ardenne, Provence-Alpes-Côte-d’Azur, Centre, Ile-de-France, BasseNormandie, Midi-Pyrénées and the pediatricians are especially acknowledged for their fruitful cooperation and comments. JP Necker and X Ferreira are acknowledged for their skilful help. 7. References [1] Landais P. L’insuffisance rénale terminale en France : offre de soins et prévention. Presse Med.2002; 31:176-85. [2] Landais P, Simonet A, Guillon D, Jacquelinet C, Ben Saïd M, Mugnier C, Simonet M. SIMS@REIN:Un Système d’Information Multi-Sources pour l’Insuffisance Rénale Terminale. CR Biol 2002;325:515528. [3] Ben Said M, Simonet A, Guillon D, Jacquelinet C, Gaspoz F, Dufour E, Mugnier C, Jais JP, Simonet M, Landais P. A dynamic Web application within an n-tier architecture: a Multi-Source Information System for end-stage renal disease. Stud Health Technol Inform. 2003;95:95-100. [4] Arlitt M, Krishnamurthy D, Rolia J. Characterizing the Scalability of a Large Web-based Shopping System. ACM Transactions on Internet Technology, Vol. 1, No. 1, August 2001, Pages 44–69 [5] Bondi AB Characteristics of Scalability and Their Impact On Performance. Proceedings of the second international workshop on Software and performance. WOSP 2000, Ontario, Canada - ACM 2000 158113-195-X/00/09. [6] Al-Tawil K, Al-Katham IA. Evaluation and Testing of Internet Firewalls. Int. J. Network Mgmt. 1999; 9:135-149 [7] Burnside M, Clarke D, Mills T, Maywah A, Devadas S, Rivest R. Proxy-Based Security in Networked Mobile Devices. Proceedings of the 2002 ACM symposium on Applied computing - Madrid, Spain 2002: 265 – 272 [8] Le Mignot L, Mugnier C, Ben Saïd M, Jais JP, Le Bihan C, Richard JB, Taupin P, Landais P. Avoiding Doubles in Distributed Nominative Medical Databases: Optimization of the Needleman and Wunsch Algorithm. MIE 2005 (in press) [9] http://www.ameli.fr/174/DOC/1182/dp.html (document published on January 8th 2004 and observed on May 23rd 2005) or http://www.fehap.fr/sanitaire/dialyse/PrevalenceIRC09012004.pdf (document published on January 15th 2004 and observed on May 23rd 2005)
Address for correspondence Pr Paul Landais, Service de Biostatistique et d’Informatique Médicale, Hôpital Necker-Enfants Malades, 149, rue de Sèvres, 75743 Paris cedex 15 - France. E-mail :
[email protected]
Section 13: Public Health Informatics, Clinical Trials