Development and Deployment of an Internet-Based Data ...

3 downloads 100252 Views 153KB Size Report
quired a simple, rapidly deployable familiarization and training pro- cess. ...... Adobe postscript version 2 printers and have a printing rate of four pages per.
Development and Deployment of an InternetBased Data Management System for Use by the Asthma Clinical Research Network Robert M. Curley, MS, Richard L. Evans, MS, James Kaylor, BS, Rosanne M. Pogash, MPA, and Vernon M. Chinchilli, PhD for the Asthma Clinical Research Network Department of Health Evaluation Sciences, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania

ABSTRACT: Data management system development for the first Asthma Clinical Research Network (ACRN) study began at the data coordinating center (DCC) in May 1995 with the requirement for delivery of a production system by November 1995. Special methods had to be used to establish an internet local area network (LAN), place clinical client systems, and achieve an accelerated software development cycle. The development of a fully integrated data management system prior to the start of the study was not possible. Therefore an early analysis focused on identifying discrete groupings of data management functions that would allow development of distinct database modules to provide specific functionality such as subject randomization, subject registration, and data entry. The modules were categorized as either being associated with clinical centers or the DCC so that the clinical center modules could be developed and delivered to meet the start date of the study. In the second phase of development during the relatively slow patient-enrollment period, the DCC functional modules were delivered discretely over time. While at the time this development model was a necessity due to limited DCC resources, it continues to be used today as it permits the DCC to implement studies more rapidly and efficiently for the ACRN. This paper describes the methodologies used to develop an internet-based LAN, establish clinical center client systems, establish DCC client and server operations, and develop a data management system. It describes the circumstances that contributed to the development of these systems and the special methodologies developed. The technical aspects of the data management system and LAN are presented as well as a description of the requirements and constraints analysis used to develop the hardware and software systems. Control Clin Trials 2001; 22:135S–155S © Elsevier Sceince Inc. 2001 KEY WORDS: Clinical trials data management, internet-based database

Address reprint requests to: Richard L. Evans, MS, Research Computing Unit, The Department of Health Evaluation Sciences, Penn State College of Medicine, PO Box 855, MC A210, Hershey, PA 170330855 ([email protected]). Received August 25, 2000; accepted July 23, 2001. Controlled Clinical Trials 22:135S–155S (2001) © Elsevier Science Inc. 2001 655 Avenue of the Americas, New York, NY 10010

0197-2456/01/$–see front matter PII S0197-2456(01)00174-X

136S

R.M. Curley et al.

INTRODUCTION The Asthma Clinical Research Network (ACRN) was established in 1993 by the Division of Lung Diseases (DLD), National Heart, Lung, and Blood Institute (NHLBI). It is a multicenter group designed to conduct multiple studies of a single disease, namely, asthma. The ACRN is dedicated to conducting welldesigned clinical trials for rapid evaluation of new and existing therapeutic approaches for asthma and for dissemination of laboratory and clinical findings to the health-care community. Six clinical centers and one data coordinating center (DCC) currently comprise the network (see Appendix A). The DCC resides within the Department of Health Evaluation Sciences at the Pennsylvania State University. The department is composed of disciplinespecific units, such as biostatistics, data management, and administration. Funding notification for the DCC was received in September 1993. The development of the data management system software and associated computer systems became the responsibility of the data management unit, applications development unit, and the network and systems administration unit. The project is currently in its seventh fiscal year. The design of the data management system originally envisioned by the DCC required a computer system at each clinical center. The system would be located in proximity to the clinical facilities used by the clinic coordinator for study visits. To develop the design a requirements analysis study was conducted [1] to determine the appropriate data management tools and systems. The clinical center and DCC software and hardware requirements follow. REQUIREMENTS ANALYSIS FOR CLINICAL CENTER/DCC COMPUTER SYSTEMS Prior to the implementation and deployment of the data management system for the first clinical trial (Beta-Agonist in Mild Asthma [BAGS]), the following data management system issues were analyzed and identified as requirements for all clinical trials.

Clinical Center Data Management System Requirements 1. The NHLBI requested the data management system be distributed to the clinical centers. The data management system functions available to a clinical center could range from distribution of only data entry to almost complete distribution of data entry and data management capability. While several options were analyzed by the DCC, the staff chose to distribute remote first data entry to the clinical centers to enhance data quality. When first data entry occurs as close as possible to the source of the data collection, the highest quality data is produced [1]. Second data entry, verification, and validation with conflict resolution would occur at the DCC. 2. The literature related to data coordinating centers and data management systems suggests that on-line tools be developed to help clinical center personnel perform their assigned tasks more efficiently. The DCC elected to deploy a distributed suite of productivity tools, including e-mail and an electronic calendar, to the clinical centers as part of the data management system.

Internet-Based Data Management System

137S

3. The users of the data management system represented an extremely heterogeneous group in terms of their familiarity with both computer systems and data management systems. Additionally, the certainty of loss of trained clinical center personnel would require that new personnel be trained by the DCC on a regular basis. The data management system required a simple, rapidly deployable familiarization and training process. To support this goal all system software was constructed using a graphical interface to permit rapid comprehension and ease of use. 4. A data delivery method had to be chosen from either “real-time” or “storeand-forward” data entry communications. In real-time network communications, the information is sent to the receiving party as soon as the operator types it into the computer. In a store-and-forward scenario, data entered into the computer are stored locally and then forwarded to the receiving party at a later time. Staff within the DCC determined that real-time data entry was most desirable to ensure that data entered at the clinical centers would immediately be available at the DCC for data management and statistical purposes. Real-time e-mail communication with the clinical centers was also a necessity. The multiple, simultaneous clinical trials conducted by the network contributed significantly to this decision. Fast-paced clinical trials and network-support functions required that delays in communications and data delivery be minimized. 5. The network had to accommodate rapid development and deployment of multiple, concurrently operating data management systems as a result of the ambitious time schedule. Facilitating multiple, simultaneous clinical trials was the primary goal of the network and required that data management systems be available for all of the ongoing studies. 6. Any hardware acquired to support the network had to be compatible with the software features identified above. 7. Local printing capabilities were required. 8. To save on travel costs and still maintain a wide area network (WAN), personnel at the DCC had to be able to remotely configure and maintain the system with minimal clinical center participation while minimizing the hardware setup and training time required for the staff at the clinical centers. 9. The system had to be implemented with the lowest possible network configuration connectivity costs to ensure reliable connectivity. The internet was chosen as the medium to connect the WAN because there were no direct charges associated with its use. However, the reliability of the internet was a concern and a backup method of data transfer to and from the DCC had to be identified to ensure the timely transfer of data in the event the internet was not available. 10. Any system deployed to a clinical center had to be compatible with the existing UNIX computing environment at the DCC.

DCC Data Management System Requirements In addition to the requirements related to the clinical center data management system there were specific requirements applicable to the DCC portion of the data management system.

138S

R.M. Curley et al.

11. A design tool was required to design and develop standardized case report forms required throughout the multiple protocol developments. The tool had to handle complex form design and allow easy and efficient form modifications. 12. The operating system software for the DCC project had to provide a standard user interface with a flexible and stable windowed computing environment. Additional requirements included the use of a centralized electronic data storage area of all the project data and programs, compatibility with the existing computing environment within the DCC, and the need to support multiple software products required for the project. 13. The use of the internet as the primary WAN would require the DCC to develop and support redundant connectivity for this project. The redundancy would be required to ensure access to the project data and databases from the clinical centers regardless of the performance or availability of the internet. 14. Application software related to the relational database management system, statistical analysis, and e-mail had to be acquired and made available to DCC personnel for data management and statistical analysis purposes. The DCC data management system supporting platform requirements were directed at three areas: (a) the end-user/client-side computing environments, (b) the database computing/server-side environments, and (c) the network components to allow the appropriate WAN connectivity. 15. The DCC project staff would require end-user/client-side platforms that run concurrent multiple windows to support several databases and productivity tools. The platforms were required to be homogeneous to facilitate easy platform replacement should a system failure occur; similar platforms also allow standardized training. The staff within the DCC would maintain the platforms using the standard operating systems and networking technologies already within the DCC. 16. The DCC computing/server-side environment for this project would require a dedicated database server at the DCC to host the database management system and electronic data storage areas. To satisfy the requirement for multiple simultaneous study protocols the database server had to allow the operation of concurrent databases. The project required large data storage capacities, capable of storing all the electronic data files generated throughout the project. DCC personnel would support the server. 17. The DCC database computing/server-side platform would require a test/ development storage area to be used during the development phases of the project. The test/development area would provide an environment for system testing and a staging area for tests of new or upgraded operating systems, database software, and development software before the products were used for production purposes. A separate, clinical center platform environment would be required at the DCC to replicate and test the actual clinical center environment and the interactions within the WAN. CONSTRAINTS ANALYSIS Once the requirements associated with the data management system software and hardware were analyzed, a constraint analysis was conducted to de-

Internet-Based Data Management System

139S

termine which limitations may hinder the system development. Descriptions of the constraints follow. Material Constraints The DCC had no prior development experience using a graphical user interface (GUI) and did not have the necessary GUI environment and tools. A GUI development package had to be selected before the GUI tools could be constructed for the ACRN. The primary selection criteria for the software included quality, low cost, rapid procurement, compatibility with the existing DCC computing environment, and the ability to provide expedient development. An analysis of the current capabilities of the relational database management system (RDBMS) used within the DCC revealed that it provided only character-based applications development and would need to be replaced with a graphical applications RDBMS program. The primary criteria for selecting a RDBMS included quality, low purchase cost, compatibility with the existing DCC computing environment, and a graphical applications development. Acquiring the development tools for the GUI was a major programming constraint. A development environment was established at the DCC during the last 2 weeks of March 1994 and allowed development of the training data management system during the months of April and May. This training system was used to instruct clinical personnel in June 1994. Upon completion of training, a development effort of a pilot data management system for the first clinical trial was initiated; the development of the pilot system took 3 and a half months. The acquisition and selection of the RDBMS became another critical undertaking prior to the implementation of the first clinical trial because the staff at the DCC was required to install and learn a new database management tool. The database development was initiated at the DCC in mid-August 1994 with the installation of the necessary RDBMS software. Development continued until the beginning of September 1994. The database portion of the data management system filled the months of September through November 1994; a production system was delivered for the main BAGS study at the beginning of December 1994. The absence of established internet connectivity at the clinical center sites was another constraint. The optimal computing model would use the internet as the medium to connect the clinical sites and the DCC. Each institution housing a clinical center had unique problems related to internet connectivity. The problems ranged from severe (a complete lack of institutional connectivity) to moderate (delays in clinical center connectivity). Although the difficulties have been resolved, these problems were of serious consequence during the early days of the ACRN. Use of a modem point-to-point protocol (PPP) network connectivity, eventual establishment of transmission control protocol/internet protocol (TCP/IP) network connectivity, and deployment of local UNIX servers all served as iterative solutions to remedy the various problems. Fiscal Constraints The initial funding level provided for DCC activity proved to be inadequate. This is not an uncommon situation and is mentioned solely because of the significant affect on the development of the data management system. As work

140S

R.M. Curley et al.

progressed from the first fiscal year through the third (September 1995 to August 1996), the DCC had to provide additional funding for the project as a subsidy to the support provided by the sponsoring agency. In the first fiscal year, the DCC received adequate funding for approximately 6.75 full-time employees (FTEs) for the initial phase of establishing the project. In fiscal year two, sponsor funding provided 8.58 FTEs, and the DCC provided a documented level of activity consistent with 10.9 FTEs. In fiscal year three, sponsor funding provided 8.25 FTEs, and the DCC provided a level of activity of 10.16 FTEs. It was impossible for the DCC to subsidize funding to the project indefinitely and it became necessary for the DCC to justify to the sponsoring agency a supplemental funding increase to the DCC. The DCC eventually determined that to complete the activity level required by the steering committee, approximately 12.2 FTEs were required. A request for supplemental funding was prepared by the DCC and submitted to the NHLBI. The NHLBI determined that a DCC site visit should precede a decision regarding the granting of supplemental funding to the DCC. Following the DCC site visit, supplemental funding to the DCC was granted. Funding for approximately 12.2 FTEs was provided by the NHLBI as of May 1996. Personnel Constraints The first major undertaking prior to the implementation of the first clinical trial was the recruitment of personnel; however, the initial absence of personnel and skills did not significantly affect the data management systems development. The DCC principal investigator and the center director demonstrated a serious commitment to acquire the required personnel. A STRATEGIC VIEW OF THE DEVELOPMENT PROCESS Once the requirements and development constraints associated with the data management system and hardware were identified, milestones were created to quantify the time required to create and deploy the system. After a review of the milestones it was obvious the time available was minimally sufficient and would require special development methods to meet the projected deadlines. The literature suggests that any development should be preceded by a strategic overview of the proposed project [2]. DCC personnel used enterprise architecture planning (EAP) to develop a strategic view of the project. EAP is an approach for planning data quality and achieving the information systems (IS) mission [3]. The IS mission is to provide: (1) timely access to data, (2) a flexible and maintainable approach to systems development, (3) data integrity and standards, and (4) cost-effective data and systems integration. EAP does not involve the physical design and implementation of a system but rather provides a high-level blueprint (architecture) of data, applications, and technology that is a long-term cost-effective solution. Architectures were developed (high-level blueprints or conceptual models) to define and describe the data, applications, and technology needed to support the ACRN project. The architectures were required to achieve the IS goals of the project. An analytical assessment was initiated to determine “where the DCC is today” and involved business modeling and the identification of systems cur-

Internet-Based Data Management System

141S

rently in place for both applications and supporting technology. DCC personnel compiled information about the business of being a DCC and studied the data accumulated and used by a DCC. Following the analysis, a planning process was used to establish “where we want the project to be in the future.” DATA ARCHITECTURE The primary goal of an IS project is to acquire quality data. Because of the need to obtain detail sufficient for planning purposes, the data architecture must be defined first. The major deliverables associated with the data architecture includes various diagrams that define data entities, entity attributes, and entity interrelationships. Short descriptions of the diagrams and processes follow. Data Flow Development A data flow diagram was developed that represented the acquisition, flow, and processing of all data associated with the clinical study. The data flow diagram displays all the processes performed both manually and electronically, including those processes performed by the data management system and other software. Information was gathered regarding the specific activities associated with each individual process including the skill set required of the individual performing the represented processes. The data flow diagram proved invaluable to the success of the development and ultimately to the successful processing of the data by the DCC. Schema Development Entity relationship diagrams were developed to model all database data and interrelationships [4]. The entity relationship diagrams were converted to table instance charts [4] that were used to develop SQL scripts to define all necessary database tables. Version 7.0.16 of the Oracle server was used to construct the database. RDBMS Development Two discrete layers of programs constitute the RDBMS portion of the data management system. The schema layer consists of the database objects, such as tables that actually store the data associated with the case report forms of the clinical trial, while the application layer provides a graphical interface environment to the programs allowing data to be entered and manipulated by clinical and data management personnel. The application layer was developed using Oracle Forms version 4.0.12. The application was developed for use on a Sun SPARCstation running Solaris 2.3 using X-windows (OpenLook). Separate applications were developed for each of the data management system modules. APPLICATIONS ARCHITECTURE The applications architecture identified every application needed to manage the data associated with the project. A brief description of the process used to construct the applications follows.

142S

R.M. Curley et al.

Software Analysis and Design Methodology: Rapid Applications Development DCC personnel employed rapid applications development (RAD) for the analysis, design, and building phases of data management system development. RAD is a methodology for deploying software within compressed time frames. Rather than focus efforts upon the delivery of a complete and fully integrated system using the traditional development cycle, the software system evolves around a core set of deliverables that are deployed over time in order of importance. A few critical techniques were applied and strictly adhered to during the RAD. A delivery date for a specific deliverable was agreed upon by all the participants and was considered a “drop-dead” date. This forced all members of the interdisciplinary development team (biostatistics, data management, applications development, and network and systems administration) to focus upon the most critical features of the system. The RAD employment was most evident during the analysis and design phases of the system life cycle represented in Figure 1. Productivity Tools The data management system included a suite of productivity tools to help field site personnel perform assigned tasks more efficiently [1]. A GUI provides access to these productivity tools. Sun Microsystems’ Calendar Manager software was used to provide a calendar for appointment scheduling. A separate calendar was maintained by the DCC to keep clinical center personnel apprised of ACRN-related activities. A subject scheduler program was developed to allow clinical personnel to enter the first visit date of a subject and view and print a schedule of subject visits (with windows) for the complete trial. Using the Sun Microsystems’ Calendar was cost-effective, and no programming effort was required to develop the tool. SUN Microsystems’ MailTool, an e-mail application that would facilitate communications between ACRN personnel, was chosen as the standard communications tool. In addition to routine communications, MailTool provides a means to resolve rapidly queries regarding data between data management at the DCC and clinical center personnel. MailTool allows clinical personnel to communicate readily with the DCC and with other clinical sites. Additional on-line tools available to the DCC and not commonly recommended by literature were made available to clinical center personnel. A public-domain, X-window-based rolodex-style tool was compiled and used to provide a global rolodex of ACRN personnel. A public-domain, X-window postscript viewer called GhostView was used to provide a document viewing and printing tool that allows access to study-related documentation such as data collection forms. GUI Development SunSoft OpenWindows Developer’s Guide 3.0.1 (Sun Microsystems) is a development tool designed to facilitate the development of GUI interfaces and was the development tool of choice for the ACRN project. The tool permits the programmer to create and simulate graphically the operation of an interface without writing any code. Interfaces are assembled by dragging visual representations of interface elements (windows, control areas, buttons, menus, sliders, etc.) onto the environment workspace and setting the properties of the

Figure 1 Gnatt Chart (GUI  graphical user interface; BAGS  Beta-Agonist in Mild Asthma trial; RDBMS  relational database management system).

Internet-Based Data Management System

143S

144S

R.M. Curley et al.

elements. Because interface design does not involve programming, design activities can focus exclusively upon the appearance and functionality of the interface. The development tool simulates the operation of the interface and allows the programmer to test extensively the interface without writing code. Once the interface is finalized, code generators allow generation of the interface source code and link the interface to programs to provide user access to the program functionality. The OpenLook Interface Toolkit (OLIT) code generator was used to develop the ACRN project interface. The GUI is referred to as the Clinical Research Utility for X-Terminals (CRUX). This interface consists of two windows. One window is titled the ACRN TOOLBAR. The toolbar window provides the clinical center user with access to programmatic functionality including e-mail, user calendar, ACRN calendar, ACRN rolodex, and an option that allows the user to exit the system (logout). The second window of the interface is titled ACRN STUDY PROTOCOLS. This window allows the user to access application programs associated with one of the several concurrent studies that may be in progress. Selection of a protocol will result in access to the clinic coordinators main menu. This menu provides access to the following functions: randomize subjects, generate possible subject schedules, enter subject data, access documents and reports, and exit menu. The enter subject data option provides access to the relational database portion of the data management system. Other applications developed exclusively with Developer’s Guide include the randomize subjects module and the generate possible subjects schedules module interface.

Data Management System Module Development The following data management system modules were developed to provide the functionality required by the DCC and the clinical centers. A brief explanation of the modules is provided.

Registration Module The registration module is used to make the data management system aware of the existence of a patient, packet, or case report form. This module is used by both clinical center and DCC data management users.

First Data Entry Module The first data entry module is used by the clinical centers to enter the data collected during subject visits into the data management system using case report forms. Study case report forms fall into two categories: packet and single. Packet case report forms are routinely completed at a specific study visit and are grouped into a packet that is completed with the subject during the specified visit. Single case report forms are optionally completed out of necessity and may or may not be associated with a routinely scheduled subject visit. DCC data management users may also use this module.

Internet-Based Data Management System

145S

Interactive Data Verification Module The interactive verification module is used by DCC data management users to perform a second entry of the data collected via case report forms and initially entered by the clinical center users. Notifications of the differences that exist between the first and second data entries are resolved by consulting the source documentation. The interactive verification module presents the user performing the second data entry with discrepancies between the first and second datum just entered, should a discrepancy exist. The user performing the second entry has the option of choosing the datum originally entered by a clinical center user or the datum entered during second entry. A third and new value may also be entered. Data Validation Module The data validation module tests the data associated with a subject for the presence of data that are missing, out of range, or inconsistent. Error checks (data rules) are applied to the data, and any violations of data rules are stored and brought to the attention of DCC data management for resolution. Interactive verification may be conducted in either of two modes: batch or interactively. Batch data validation may be run as a batch process with whatever frequency is deemed necessary by DCC data management users and will subject all data associated with a study to error checking. Interactive data validation provides the DCC data management user with the option of validating the data associated with a specified user or clinical center. Data Editing Module DCC data management users update values contained within the database using the data editing module. The interface permits the user to specify the subject, visit, and type of case report form (single or packet) to be updated. Should the user specify a clinical center, all case report forms associated with the center will be presented for selection. Once a case report form has been chosen using either of the prescribed methods, the corresponding data are then presented to the data management user in the case report form format associated with data entry for updating. Data Querying and Reporting Module Oracle Data Browser is a graphical tool for DCC data management and biostatistics users. This Oracle tool is designed to allow the retrieval of data from a database into a familiar spreadsheet format for analysis and reporting purposes. Using this tool the user can formulate complex queries without programming experience. The user submits queries, and syntactical errors in the query constructed by the user are detected. Valid queries may be saved by users and retrieved for reproducible analysis and reporting purposes. This arrangement was considered superior to requiring DCC users to query the database using pure SQL scripting. It also eliminated a dependence upon DCC applications development personnel for development of SQL queries. This tool allows DCC personnel to engage in both ad hoc and routine query-

146S

R.M. Curley et al.

ing of the study database without requesting the development of additional SQL programs.

Data Management System Module Specifications Formal module specifications were not developed. However, DCC personnel developed documentation of the functionality of data management system modules that was much less formal than traditional specifications; when combined with the formal data flow diagram, these informal agreements of understanding proved quite satisfactory. Module prototyping, prototype review, and approval of data management units were used to ensure proper development of modules.

Menu Access to Data Management System Module Functionality Differing levels of expertise and application requirements between the clinical center and data management users required the DCC programmers to develop two different menu screens for access to applications. The clinical center user menu provides access to the registration module consisting of patient registration, packet case report form registration, and single case report form registration. It also provides access to the first data entry module for entry of packet and single case report forms. The data management user menu provides access to the same registration and data entry options available to the clinical center users as well as access to the interactive data verification module, data validation module, and data editing module. These menus provide graphical button access to all the functions and are displayed in a standard view to assure reproducible operation of the system despite relatively low level of computer expertise.

Delivery Schedules A key to the data management system software effort was the development of schedules to track the progress of building applications. As the project moved forward the schedules were constantly revised and updated. Any delay in the delivery of the software was immediately reflected in a revised software delivery schedule. In this manner, the entire team was continually apprised of the impact of delays upon the software development process.

Migration of Development to Production Database The DCC used two separate databases for the ACRN development. All development occurred in a development database. After testing by the applications development and data management users, the applications were migrated to a production database. This production database was the on-line transactional system used for the entire study. A rigid regimen of development control in the development database with migration to the production database was enforced throughout the study. The database administrator con-

Internet-Based Data Management System

147S

trolled access and privileges to the production database. Access to the production database was limited to data management and biostatistics personnel. Applications development personnel had no privileges within the production database except those necessary to perform SQL “select” upon the data. Challenges The delay associated with delivery of the data validation module was the most significant challenge. Verified data were not subjected to data validation for approximately 4 months. During this time, data acquisition and verification were occurring. Limited data validation was being conducted manually through visual examination of the case report forms. Queries to the clinical centers were occurring to resolve verification-related problems; however, only limited validation-related queries were occurring. Once validation was finally implemented, a large number of validation problems were detected and the associated queries needed to be processed. While inconvenient, this situation was not of serious consequence to the study or the quality of the data. Redesigns Problems were experienced with several of the data entry screens associated with case report forms. The patient diary form data entry screens were redesigned based on an unacceptably high level of data errors in the data associated with this form. A data audit (comparison of database values against source documents) of data associated with this form indicated transcription errors in excess of the level (10 errors per 10,000 values audited) generally accepted by the industry [5]. This unacceptably high rate of errors was attributed in part to the design of the data entry screens. Future Data Management System Improvements McFadden et al. [1] recommend that a distributed data management system include local reporting capabilities to allow clinical center personnel to view reports of data. A tool to allow local reporting of case report form data entry status and subject status was developed for the second ACRN study protocol. This Oracle application allows clinical personnel to determine the status of subject data entered at the clinical centers. Additional modifications to modules have increased data management system functionality. The details of these modifications are beyond the scope of this initial publication. Future plans include the incorporation of ACRN study data into an ACRN data warehouse. Upon closure of a study the clinical data is replicated from the on-line transaction processing (OLTP) system to the data warehouse (a repository of static study-related data). Additionally, an online analytical processing (OLAP) tool is distributed and installed on the client systems at the clinical centers. This tool is designed to facilitate highly graphical ad hoc querying and analysis. All data from the closed study become available to investigators for “point-and-click” analysis

148S

R.M. Curley et al.

within the administrative guidelines established by the ACRN. This access could be extended to other investigators currently outside the ACRN with administrative consent of the ACRN and would generally facilitate the availability and analysis of study-related data over a WAN.

TECHNOLOGY ARCHITECTURE The technology architecture defined the major types of technology platforms to provide an environment to manage the data. These technology platforms provided a means for collecting data from clinical centers, transporting, storing, and processing the data, and delivering the data to the statisticians for analysis. Technology principles identified included the use of client/server technology; relational database technology; data access through SQL; access to all applications through a common GUI; maintenance of security of data, software, and hardware at all levels of technology; centrally administered data; and assurance of operations recoverability. System configurations were built from analysis and design specifications. Clinical center and DCC computer system configurations are explained. Clinical Center Computer Systems The standard computer configuration at each ACRN clinical center consisted of one or more X-terminals, a modem, a printer, and one local UNIXbased server. See Appendix B for details. X-terminals The clinical center’s X-terminal, printer, and modem were deployed several months before the start of the first protocol to facilitate communications within ACRN. These platforms were deployed using a contingency WAN of dial-up network connectivity since the clinical centers were actively developing their respective LAN connections to the internet. As each site achieved internet connectivity, the computer systems were switched to the internet as the primary network link. Clinical Center UNIX Server Initially all clinical sites, with the exception of the Philadelphia clinical center, utilized a local UNIX server that was dedicated to supporting the activities of the clinical site with file services, e-mail, calendar scheduling, and printing. These servers are an important addition to the clinical center computer systems because they minimize, to the greatest extent possible, the amount of traffic that is required to transverse the internet network between a clinical center and the DCC. After experiencing numerous issues with slow data traffic, the Philadelphia center was eventually provided a local UNIX server. Modems To ensure that there is a network connection available, each site has a modem attached to the local UNIX server and/or the X-terminal. The modem provides the clinical center with the ability to dial directly into the DCC. This

Internet-Based Data Management System

149S

allows the clinical center to continue project processing in the event a failure occurs somewhere on the internet between the DCC and the clinical center. Printer Each clinical center has local printing capabilities. The printer attaches directly to the local UNIX server and/or the X-terminal. With printer selection presented through the applications and tools, individuals can select to print forms, e-mail, and other correspondence. Network Two network protocols were used to attach all computer systems throughout the ACRN. The TCP/IP protocol is used for the LAN at the clinical centers and the dial-up WAN uses the PPP protocol to support TCP/IP. Internet Connectivity The use of the internet as a primary communications link proceeded in a staged manner. At the start of the project, each clinical center initiated the development of a LAN within their respective institution. The DCC helped facilitate this LAN development by disseminating the specific needs of the project to the institution’s network organizations. All centers initially used a dial-up LAN connecting their site directly to the DCC. As each clinical center developed their internet connection, testing was performed to determine the network propagation delays. If significant propagation delays were discovered, a local UNIX server was then deployed to facilitate and effectively use the internet as the primary connections link. DCC Computer Systems Four distinct computing technology foci exist at the DCC. These four areas of technology and the related computing environments encompass all systems located at and utilized by the DCC. A production environment was developed specifically for the databases and electronic data storage areas. A development environment was established for all data management system work related to clinical protocols in the developmental (nonproduction) stage. The test environment was established to allow simulation at the DCC of clinical center computing environment. See Appendix B for details.

DEPLOYMENT PHASE CHALLENGES From the onset of the ACRN project there was a global project commitment to leverage the internet as a viable resource. Of primary importance was the notion of cost containment. It was felt that by using this national research network, the costs of communications would be less than traditional means of communications. The commitment was made early on to use the internet, and at that time a poll was made of the clinical centers to determine connectivity to

150S

R.M. Curley et al.

the network. At the beginning of the study no center was connected to the internet directly, and therefore a trial and error period to measure the success of the network communication was established. X-terminals were selected as the interface method based on a projection of existing network activity speeds, required features for the applications, maintainability of systems, and again, as always, cost. As clinical centers connected to the internet, the stability and speed of transfers on this media were tested, assessed, and the speed of the network at four of the five clinical centers was monitored. The monitoring results indicated that the network responses and stability of the network are affected regionally. Unfortunately the most unstable area was the midwestern region, and the Denver clinical center was directly affected. The clinical centers on the internet were monitored and many received site visits from DCC personnel to determine the usability of their X-terminals in conjunction with ACRN applications. Through this monitoring, it was determined that one site, Denver, had a severe network problem. Network access speeds to Denver were dismal. Local support of their network was marginal due to lack of support personnel. This scenario led to discussions and a decision to escalate the development of a local server capable of supporting the same application package required by the ACRN project at the DCC. Due to the extreme problems at the Denver site, the DCC designed and deployed a restricted feature server capable of immediately providing the Denver site with the ability to use ACRN applications. The goal of this initial deployment was to minimize the amount of traffic that was required to cross the internet, while still maintaining the required data integrity existing using the database and the ACRN application. Immediately following this deployment, the DCC continued with their plans to complete a server configuration that provides both a network and a dial-up operation. Dial-up was provided to maintain network connectivity in the event of a long-term network outage. The current server and X-terminal configuration provides ongoing support for the applications required by the ACRN and also mitigates the effects of unstable and slow internet network access. This configuration continues to use the internet as a viable cost-effective media for the transfer of data between the DCC and the ACRN clinical centers. THE INTERNET AS WAN MEDIUM The choice of using the internet as the primary communications link for this project was a very aggressive decision. Although internet access was available within all the institutions at the onset of the project, internet access was not available within any of the specific clinical areas used by the project. The commitment of the involved parties to utilize the internet was the overriding catalyst for this decision. The use of the internet was not without concerns, but the accrued benefits justified this decision. At the start of the project, there were two immediate requirements. It was necessary to provide the clinical centers with a real-time method of communicating with each other and the DCC. It was also necessary to have each clinical center acquire an internet connection in their clinical area.

Internet-Based Data Management System

151S

Without the primary communications link available at the onset of the project, the ACRN WAN started communicating between the clinical centers and the DCC via a “modem” contingency WAN. This contingency WAN allowed the clinical centers to begin the real-time communications (e.g., e-mail and calendar scheduling) needed to expedite the development of the initial ACRN protocols. Starting the WAN with a contingency plan was also an excellent proving ground to guarantee the integrity of the WAN. While the clinical centers acquired their internet connections, they were actively using the backup modem WAN for the development of project protocols, e-mail, and calendars. As the clinical centers acquired internet connections, each site independently switched over from the contingency modem network to the internet as the primary communications link. At this point, each site had two options available to ensure connectivity with the other sites. The internet connection prevented the need to maintain leased lines or dedicated circuits to sites throughout the country and saved the associated direct connection charges. With the internet as the primary WAN connection, the DCC has been able to efficiently troubleshoot, manage, and update the deployed computer systems within the ACRN network with a minimum of clinical center intervention. A major concern related to the use of the internet as the primary communication’s link has been and continues to be the adequacy of bandwidth for the transmission of data. As the clinical centers developed their local internet connections, the dynamics of using the internet endpoint to endpoint was tested. The results of end-to-end connectivity across the internet showed several severe bandwidth-related problems within the internet-based WAN. Internet communication requires traversal of many autonomously managed network links. Any of these links can dynamically cause a reduction in the speed of the network or total network failure. The ACRN modem-based WAN became a contingency for a total internet failure. Contingency plans had not been adequately developed for what proved to be a more problematic situation: a reduced-speed network. Without the ability to control the WAN end to end, the ACRN WAN did in fact experience difficulties related to the response speed of the WAN. To rectify this situation, the clinical center system configuration evolved to include local UNIX servers. All clinical centers now have a local UNIX server, which minimizes the amount of data traversing the internet. Real-time traffic traversing the WAN is limited to database packets. With this system configuration, the internet has truly become a viable primary communications medium. From time to time network-related difficulties occurred at either the clinical centers or the DCC. Network difficulties at any of the participating institutions require resolution by institutional computing groups, and the local priorities of these institutional computing groups may supersede ACRN WAN-related difficulties. The local network administration at any institution may implement changes that are not announced even to their own institution let alone to the DCC network administrators. These changes may directly affect the functionality of the ACRN WAN. The DCC has been successful resolving past instances of this type of behavior that have affected the ACRN WAN; however, firewalls

152S

R.M. Curley et al.

and security issues are becoming volatile and raise similar issues more frequently. Network security is one of the latest areas to affect significantly the ACRN WAN. The DCC has implemented the ability to authenticate and encrypt the data between the local UNIX servers and DCC databases. With this security feature implemented, institutional concerns regarding the protection of data and outside connections to internal computer systems have been addressed. CONCLUSION Despite the challenges outlined in this article (i.e., short development cycles, lack of GUI programming experience, and using the internet as a WAN), the ACRN network continues to serve the research centers for which it was initially designed. The protocols continue to be delivered on time, and the original interfaces are still effective. The DCC is currently in the process of developing the first web-based deployment of a protocol application with an expected delivery time of fall 2001. Delivering the applications using a web interface will reduce programming and application maintenance costs. The web will also provide global access to an ACRN data repository. This work was supported by grant no. U10HL51845, Division of Lung Diseases, National Heart, Lung, and Blood Institute, National Institutes of Health.

REFERENCES 1. McFadden E, LoPresti F, Bailey L, et al. Approaches to data management. Control Clin Trials 1995;16:30S–65S. 2. Barker R. CASE Method: Tasks and Deliverables. Wokingham: Addison-Wesley; 1990. 3. Spewak S, Hill S. Developing a Blueprint for Data, Applications, and Technology: Enterprise Architecture Planning. New York: John Wiley & Sons/QED; 1992. 4. Barker R. CASE Method: Entity Relationship Modeling. Wokingham: Addison-Wesley; 1990. 5. Gassman J, Owen W, Kuntz T, et al. Data quality assurance, monitoring, and reporting. Control Clin Trials 1995;16:104S–136S.

APPENDIX A: ACRN CENTERS The participating clinical centers are: (1) Harvard University, Brigham and Women’s Hospital, Boston, Massachusetts; (2) National Jewish Center for Immunology and Respiratory Medicine, Denver, Colorado; (3) University of Wisconsin, Madison, Wisconsin; (4) Thomas Jefferson University, Thomas Jefferson Medical College, Philadelphia, Pennsylvania; (5) University of California, San Francisco Medical Center, San Francisco, California; and (6) Columbia University, Harlem Lung Center, New York, New York. Spirometry test session quality control (overreading) is performed at National Jewish Center. The data coordinating center (DCC) is located at Pennsylvania State University, Milton S. Hershey Medical Center, Department of Health Evaluation Sciences, Hershey, Pennsylvania. The center located at Harlem Hospital joined ACRN in December 1995.

Internet-Based Data Management System

153S

APPENDIX B: TECHNOLOGY ARCHITECTURE Clinical Center Computer Systems X-terminals Human Designed Systems (HDS) X-terminals were selected for use at clinical centers and the DCC test environment. A standard system consists of a 17-inch high-resolution monitor, an internal PCMCIA 42-megabyte disk drive, 12 megabytes of RAM, an internal serial communications card, a parallel printer port, and a 10BaseT ethernet network connection. This X-terminal was selected for providing the required features more inexpensively than alternative systems. The 17-inch color monitor resolution is equivalent to a Sun Microsystems’s monitor and produces a high-resolution display of 1152  900 pixels. The screen displays 256 colors. This feature is important to the project as color is integral to the various screens and assists with differentiation of protocols that the system may be displaying. The HDS X-terminal has a UNIX-like microkernel that allows DCC monitoring and reporting of X-terminal activity whenever the HDS X-terminal is being used. This microkernel resides on the local disk and allows the HDS X-terminal to function identically regardless of the type (PPP versus TCP/IP) or speed of the network connection. This UNIX-like microkernel also provides the capability for upgrading remotely the HDS X-terminal from the DCC. Monitoring, installation of operating system patches, and software upgrades can all be initiated and conducted remotely by the DCC. The HDS X-terminal may be used in either a dial-up PPP network configuration or via a LAN connection (TCP/IP). This PPP feature allows the TCP/IP ethernet network connection to have redundancy. The system can use either the 10BaseT port or the serial port to connect to an ethernet network. The HDS X-terminal has an internal disk drive that is used to store all the X-Window protocols. The windowing protocols are necessary to run a GUI windowing system. Locating the windowing protocols locally upon the X-terminal drive minimizes the amount of network traffic that traverses the network connection. This minimization of network traversal maximizes the speed with which the X-terminal responds. If this arrangement was not utilized, the windowing protocols would have to traverse the network connection from the DCC server to the X-terminal. This would significantly slow the response of the X-terminals.

Clinical Center UNIX Server The local UNIX server is a SUN Microsystems SPARC 5 workstation. The server has an 80-megahertz SPARC processor, 32 megabytes of RAM, one gigabyte of local disk storage, and connects directly to the LAN at a clinical center. The server stores all the DCC-developed data management system and productivity tools software that is used by the clinical center. The server executes the applications that connect to the databases at the DCC and communicates via the LAN and the internet with the DCC. The X-terminal acts as the input device to the server for interactions with the executing applications. Printing is also managed on the server and forwarded to the attached printer.

154S

R.M. Curley et al.

Modems The modem is an AT&T Paradyne COMSPHERE modem, model 3820. This model modem was selected due to its unparalleled performance and manageability across standard phone lines. The modem has a baud rate of 19,200 bits per second. It can employ CCITT V.42 bis/MNPclass 5 hardware error correction along with V.32terbo hardware data compression that provides a 4 enhancement of transmission speed. The modem has an auto-dial feature with stored numbers, automatic reset of stored parameters, and multiple configurations for use with the X-terminal and the local UNIX servers. The modem PPP connection does not provide X-terminal response speed equal to LAN speed; however, provision of continuity of data processing through an alternate means of network connectivity was a compelling motivation for providing this modem configuration. Printer The printers are either the HP LaserJet Model 4ML or 5MP. The printers are Adobe postscript version 2 printers and have a printing rate of four pages per minute. Ease of maintenance and low cost were important selection criteria. The printers have a single replaceable toner cartridge capable of approximately 3000 pages as the only consumable item. DCC Computer Systems DCC Production Environment The production environment consists of the DCC database server and associated peripherals. The database server at the present time is a SUN Microsystems SPARC 20, Model 62 UNIX server. It is a dual-processor server with 60megahertz SPARC processors. The server currently has 300 megabytes of RAM and was upgraded from ten gigabytes to 60 gigabytes of disk storage. The server supports the ACRN databases and the electronic files and applications used for the data management system within the ACRN network. Nightly backups are performed on this server, which is available 24 hours a day with only scheduled maintenance outages occurring. DCC Development Environment The development environment consists of SUN Microsystems SPARC 5 computer systems assigned to the data management, biostatistics, and application development staff. All SPARC 5 computer systems have a minimum of 32 megabytes of RAM, one gigabyte of local disk storage, and a high-speed connection to the DCC network. These systems are capable of supporting multiple X-window sessions for development and data management processes. DCC Test Environment The testing environment consists of a complete clinical center systems configuration and a test database workstation. This environment was established

Internet-Based Data Management System

155S

to ensure the development environment is configured correctly prior to initiation of data management system development. The clinical center configuration at the DCC serves two purposes. The first purpose is to provide the development staff with the actual equipment to test all data management system applications under simulated clinical center conditions. The second purpose of the testing environment is maintenance-related. If a clinical center device fails, the DCC can quickly reconfigure a replacement for the failed component from the test environment and ship it to the clinical center. This minimizes computing system-related downtime at clinical centers. The defective component is shipped to the DCC; testing and repair are then undertaken by the DCC. The test database computer system is used to debug and test all new RDBMS software releases from Oracle Corporation before making them available to the data management system development teams. In this manner, the DCC ensures that all commercial database products used in the development of data management system are compatible with existing data management system applications and perform in a stable manner. DCC Network Connectivity Network connectivity at the DCC consists of a set of six AT&T Paradyne COMPSHERE model 3820 modems and a Xyplex terminal server. This equipment provides the clinical centers with the ability to dial directly into the DCC and establish a network connection with the DCC LAN. The DCC modems adhere to the specifications that were presented above for clinical center modems with one additional feature. The DCC Model 3820 has a control panel that allows it to monitor and control remotely the Model 3830 located at the clinical centers. The remote manageability is password-protected and can be performed while the modems are transferring data. This feature allows the DCC real-time monitoring of modem data transfer when necessary. The Xyplex terminal server attaches directly to the DCC internal network and runs the PPP protocol that is necessary for the modem connection between the DCC and clinical center computer systems. All locally networked DCC systems communicate using 10BaseT ethernet protocol supporting the TCP-based networks.

Suggest Documents