A Highly Scalable Information System as Extendable ... - CiteSeerX

Medical Informatics in a United and Healthy Europe K.-P. Adlassnig et al. (Eds.) IOS Press, 2009 © 2009 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-044-5-101

101

A Highly Scalable Information System as Extendable Framework Solution for Medical R&D Projects Silke HOLZMÜLLER-LAUE a,1 , Bernd GÖDE b, Regina STOLL c Kerstin THUROW a a celisca – Center for Life Science Automation Rostock, Germany b Institute of Automation, University of Rostock, Germany c Institute of Preventive Medicine, University of Rostock, Germany Abstract. For research projects in preventive medicine a flexible information management is needed that offers a free planning and documentation of project specific examinations. The system should allow a simple, preferably automated data acquisition from several distributed sources (e.g., mobile sensors, stationary diagnostic systems, questionnaires, manual inputs) as well as an effective data management, data use and analysis. An information system fulfilling these requirements has been developed at the Center for Life Science Automation (celisca). This system combines data of multiple investigations and multiple devices and displays them on a single screen. The integration of mobile sensor systems for comfortable, location-independent capture of time-based physiological parameter and the possibility of observation of these measurements directly by this system allow new scenarios. The web-based information system presented in this paper is configurable by user interfaces. It covers medical process descriptions, operative process data visualizations, a user-friendly process data processing, modern online interfaces (data bases, web services, XML) as well as a comfortable support of extended data analysis with third-party applications. Keywords. medical information system, data management in research projects, automated distributed data acquisition, framework solution

1. Introduction The preventive medicine focuses on the health care and the early detection of diseases and risks. It deals with comprehensive investigations and acquisition of physical and mental human responses at the workplace, at home or during leisure time to explore cause-effect-chains and correlations between several parameters. Therefore a variety of physiological parameters, subjective ratings and objective evaluations needs to be recorded. The examination process is usually not fixed. Additionally the used devices and their communication protocols and the data formats are different. A documentation of such investigations contains time-based physiological measurements and questionnaires from mobile sensor systems, data from stationary diagnostic systems, standardized questionnaires, personal information of test persons and laboratory values 1 Corresponding Author: Silke Holzmüller-Laue, Center for Life Science Automation Rostock (celisca), F.Barnewitz-Str. 8, 18119 Rostock, Germany; E-mail: [email protected].

102

S. Holzmüller-Laue et al. / A Highly Scalable Information System

[1]. These several data from distributed sources are captured, summarized and are the basis for the exploration of new scientific findings and knowledge. A conventional manual capture, aggregation and analysis of raw data are laboriously, error-prone and time-consuming. The huge amount of the collected data requires the application of information and communication technologies. Further advantages of using a central information system are the possibility of reusing of data for other research tasks and a simplified execution of multicenter studies, since the exchange of information and data between involved research groups and scientists is simplified with centrally accessible data. The realization as a web application assists interdisciplinary research projects at distributed locations with specific access rights for individual users and user groups.

2. Concept for a Framework Solution Due to the very different tasks and structures of research projects as well as their specific requirements a general information system for the storage of the collected data has to be very flexible. A few years ago aspects of documentation and archiving have been the primary focus. Currently communication and the assistance of exploration processes become more and more important [2]. The information system developed at the Center for Life Science Automation (celisca) fulfils these needs with a framework concept. This offers several opportunities for data capture, data processing, data visualization, search and export. The advantage of this framework approach is its flexibility and its simple extensibility. There are five specific frameworks for the current main tasks of this information system: 1. A process description framework that is based on a parameter model with different data types and a process hierarchy with nine levels. This framework allows the description of research projects and their networked processes with an arbitrary level of detail. 2. A filter and filter-application framework for user-defined selection of data of interest. 3. A computation engine for simple user-defined formulas and special and more complex algorithms for data processing provided by a library. 4. A visualization toolbox with several techniques for the visual analysis of selected numeric, nominal or multiparameter data. 5. A communication framework (configurable interfaces with task management for automated data import via database, XML, web services, or other standardized interfaces). The information system is not limited to a special application field such as preventive medicine. At celisca the system is used for several years in other fields of life sciences such as biological screening or chemical synthesis and analysis [3, 4]. The user opens up this scalable system by an arbitrary abstraction of process and an adaptable system configuration. Application-field-specific parameters and methods can be defined by the users and are stored in extendable libraries. Several visualization techniques and computation algorithms can be configured for the application field. This high scalability and the just-in-time customization of the information system lead to branch independence. This paper presents the functionalities of this information system using the example of research projects of preventive medicine.


103

3. Methods 3.1. Distributed Expandable Infrastructure Due to the required flexibility of the workflows documented in the information system and the quantity of the collected data a distributed system concept for infrastructure and functionality with measurement, storage of raw data, pre-processing and integration of process data in the documentation within the medical information system is preferred. The scalable distributed resources with server character are: 1. Raw data server with process databases and a file system as logical storage unit. 2. Network nodes with protocols for remote procedures at the application level for pre and post processing. 3. Application server with the medical information system as web application. The large amounts of data and pre and post processing algorithms with increasing need of computation time and processing power are distributed to a variable number of server nodes. Thus the infrastructure meets future performance demands. In medical studies, an increasingly decentralized automated information acquisition occurs. Mobile sensor systems offer new possibilities for investigations of stress and fitness analysis. With these systems the measurements are executed without impact on mobility and activity of the test person. An exemplary application uses a multiparameter mobile sensor system for recording the time-based physiological parameters during a given stress situation. Additionally test persons add parameters on the mobile device describing subjective assessments during the executed activities, e.g., to document their current physical and psychological state to get subjective strain rating. This information has to be added to the measurement data chronologically within asynchronous or synchronous sequences and has to be stored into a process database at the raw data server. An integration of this process databases at the raw data server in the medical information system by direct access minimizes the access time to the process data, preserves the actuality of the data and prevents redundancies. A further benefit is possible online observations of the data acquisition to detect errors or interfering factors and the opportunity to eliminate them in time. These online observations are supported by several universal and application-specific methods integrated into the visualization toolbox. 3.2. The Process Description Framework The intention of the flexible medical information management system is the complete workflow-oriented documentation of an a priori unknown medical research process. The workflow description merges all relevant information from distributed sources with arbitrary levels of detail. Workflows can be structured sequentially, parallel or hierarchically. The basis is an open parameter approach for the describing process activities and a general nine-level process hierarchy for the description of the structure of the research project. The professional user defines parameters and combines them to “Examination Steps” and “Examination Procedures” that can be stored in a method library for reapplication in other projects. The term “Parameter” is used for the description of any levels of hierarchy of the processes and the data with content and properties. This considers all relevant data types (numbers, formatted text, images, time series, structures, files, …) as well as associated information classes. Synchronous and asynchronous time series are special structure elements that are very important in the

104


examinations in the field of preventive medicine. Parameters are not only the container for measurements, process variables or documentation elements, but also for input and output conditions or characterized process states, that allow a quickly process interpretation by algorithms, by event-driven process handling or by manual process control. The proposed and realized parameter concept is the base for a flexible design of data entry masks for the validated manual collection of examination data or questionnaires. Comprehensive questionnaires are used for questioning the test persons about existing and former diseases, therapies, specifics of life style, subjective wellbeing and sociodemographic facts. By defining examination steps and procedures, data entry forms can be composed arbitrarily. The hierarchical workflow description merges physiological parameters measured with mobile sensor systems, data from mobile and paper-based standardized questionnaires, personal data of test persons, and results of interviews based on data entry forms, data from stationary diagnostic systems and laboratory values. All this information completed with secondary derived data can be analyzed together for finding correlations and new knowledge. 3.3. Data Acquisition and Integration with the Communication Framework Data sets of manual investigations and automated examination procedures with stationary diagnostic devices in distributed systems have different data formats. The automated import of this distributed data from external systems is done by a communication framework. This framework supports common interfaces such as structured text, XML, CSV and Excel files as well as process databases and web services. The flexible mapping of the received data sets to the examination parameters is carried out using a dedicated syntax and semantic converter for easy process adaptation. This converter is based on the definition of mapping rules. These rules can be defined following a structure analysis of the extern process data sets and enable the translation of all corresponding description elements within the data set. This mapping is only necessary for the first import from any source and can be reused in the following unchanged examinations. 3.4. Assistance of Data Analysis and Interpretation with Visualization Toolbox, Computation Engine and Filter-Application Framework Integration of suitable visualization techniques in the information system helps to reduce time effort for data export into specific analysis software. They are the key factor for understanding complex information derived from large data amounts. The user identifies errors or other important information and can use this information for an online interaction or for future investigations. The visualization toolbox is a collection of universal and application-specific visualization methods. Several integrated visualization methods provide a fast overview of nominal, numerical and multivariate (table) data. Thus interesting aspects, outlier or tendencies within the group of test persons can be identified quickly und the scientist can react if necessary. Visual analysis across investigations is possible. The collected measurements of sensor systems as raw data can be defective. For a processing of these data a correction, a data completion and perhaps a filtering of the data is required. Special algorithms for the correction of sensor-specific errors can be


105

applied. Furthermore several algorithms and statistical functions are used in studies of preventive medicine for computation of secondary process parameters from raw data or for the aggregation of data (for example a reduction of temporal resolution by average determination of a process parameter). The aggregated data are added to the process documentation dynamically. Complex methods such as modelling of individual or group-specific profiles for a prediction of parameters are usable too. The realization as web service is an efficient way to integrate any algorithm into the information system. Such complex application-specific algorithms and visualizations are integrated as filter applications. The data of interest are selected by a user-defined filter and act as input data for a visualization technique or a special computation algorithm. Furthermore the data processing is supported by a spreadsheet based engine for simple user-defined computations [5] and several export interfaces to third-party applications for extensive statistics, data mining and so on.

4. Conclusion This paper discusses a framework solution for an information system for research projects in preventive medicine. The main requirements of this information system are process adaptability, flexible process networking and a variable process and data distribution. The solution is a framework system consisting of modules for workflow description, process communication, filter and filter-application, data visualization, and process data computation. The framework system covers the following IT-solutions for the application in the preventive medicine research: electronic patient records, structured investigation processes with flexible online data acquisition as well as several possibilities for user definable pre- und post processing and automated interfacing to third party applications. The process documentation covers all collected and aggregated data, from simple nominal or numerical values to time-based measurement series and ranging in their complexity from e-records of test persons to the manual or automatically acquired data of different investigations. These data are available in one central system for process control, analysis and exploration of new knowledge. The presented system concept for a medical information system and its realization has been validated in a research project, where on strain investigations on automated laboratory workplaces performed with over 50 test persons. The information system is dynamically reconfigurable and adaptable to changing applications in other fields of the life sciences.

References [1] [2] [3] [4] [5]

Stoll, R., Kreuzfeld, S., Weippert, M., Vilbrandt, R., Stoll, N. (2007) System for flexible field measurement of physiological data of operators working in automated labs. Journal of the Association for Laboratory Automation (JALA) 12(2):110–114. Haas, P. (2005) Medizinische Informationssysteme und elektronische Krankenakten. Springer, Berlin. Thurow, K., Göde, B., Dingerdissen, U., Stoll, N. (2004) Laboratory information management systems for life science applications. Organic Process Research & Development 8:643–650. Göde, B., Holzmüller-Laue, S., Haller, D., Schneider, I., Thurow, K. (2007) Flexible IT-Plattform zur automatisierten HTS-Wirkstoffanalyse. GIT 51(9):741–744. Göde, B., Holzmüller-Laue, S., Rimane, K., Thurow, K. (2007) Integrierte flexible Datenverarbeitung in einem webbasierten LIMS: Idee und Praxis eines Excel-Prozessors in Serverapplikationen. Chemie – Ingenieur – Technik 12:2043–2049.

A Highly Scalable Information System as Extendable ... - CiteSeerX

A Highly Scalable Information System as Extendable ... - CiteSeerX

Suggest Documents

A Scalable and Highly Available System for Serving ... - CiteSeerX

Supplementary Information Highly Efficient and Scalable ... - Nature

LISFS: a Logical Information System as a File System - CiteSeerX

Kargus: a Highly-scalable Software-based Intrusion Detection System

Designing a Highly-Scalable Operating System: The Blue ... - SC06

A Highly Scalable and Efficient Distributed File Storage System

Flash Memory Performance on a Highly Scalable IOV System - Usenix

A Scalable Framework for Information Visualization - CiteSeerX

Highly Scalable Multiprocessing Algorithms for Preference ... - CiteSeerX

Compiler-Supported Simulation of Highly Scalable ... - CiteSeerX

A Microkernel Architecture for a Highly Scalable Real ... - CiteSeerX

Highly Scalable Aggregate Computations in Cyber ... - CiteSeerX

The Scalable Modeling System - CiteSeerX

SPRINGS: A Scalable Platform for Highly Mobile Agents in ... - CiteSeerX

A Highly Scalable Parallel Algorithm for Sparse Matrix ... - CiteSeerX

A Scalable Hybrid Multi-Robot SLAM Method for Highly ... - CiteSeerX

HIVE: a Highly Scalable Framework for DVE - CiteSeerX

A Scalable and Highly Available Brokering Service for ... - CiteSeerX

Towards a Highly-Scalable and Effective Metasearch ... - CiteSeerX

Spamato â An Extendable Spam Filter System - CiteSeerX

Spamato â An Extendable Spam Filter System - CiteSeerX

A Generally Applicable, Highly Scalable Measurement Computation ...

A highly scalable spray coating technique for

A Scalable and Highly Configurable Cache-Aware

A Highly Scalable Information System as Extendable ... - CiteSeerX