Behavior Research Methods, Instruments, & Computers 1994, 26 (2), 198-201
8. CLINICAL APPLICATIONS Chaired by Linda C. Petty, Hampton University
A client-server technology solution for a multicenter research database FRANQOIS M. LALONDE Laboratory of Clinical Science, NIMH, National Institutes of Health, Bethesda, Maryland ELIZABETH F. BISHOP Custom Software Services, Herndon, Virginia and GREGORY MARTON and ALEX MARTIN Laboratory of Clinical Science, NIMH, National Institutes of Health, Bethesda, Maryland As researchers take advantage of advances in computer and related technologies, the amount of data has increased dramatically. Recent developments in database management systems (DBMS) such as client-server database technology provide the necessary tools for merging and managing data from various fields of study, as well as, various locations. This emerging technology allows the distribution of processing and data storage operations to a variety of computer platforms, thereby maximizing efficiency and flexibility. A DBMS that uses client-server technology is described. As psychological and medical researchers have taken advantage of developments in computer and related technologies, the amount of data to be organized and managed has increased dramatically during that last decade. Computerized testing programs such as MEL (Psychology Software Tools, 1990), VSearch (Enns, Ochs, & Rensink, 1990), and SuperLab (Haxby, Parasuraman, Lalonde, & Abboud, 1993) have greatly facilitated test development, data collection, and data analysis, thereby enabling the researcher to conduct more studies more efficiently. Database management needs have also grown as a result of the multidisciplinary nature of many biomedical studies. This has been particularly evident in the clinical neurosciences, in which there has been an increasing need to integrate data from disparate sources (e.g., psychology, electrophysiology, neurochemistry, and brain imaging).
We would like to thank Michelle Ugas, Andrew Schwartz, and Donald Tiedemann of the DB2 support group at the Division of Computer Research and Technology, National Institutes of Health. Their participation in the database design and their support of the many layers of programs made this project possible. We would also like to thank Cheri Wiggs for her helpful comments in writing this manuscript and John Bishop for his help in using the SAS software. Correspondence should be addressed to Francois M. Lalonde; LCS, NIMH, NIH; Building 10, Room 3041; Bethesda, MD 20892 (e-mail:
[email protected]).
Copyright 1994 Psychonomic Society, Inc.
This article begins with an account of the conditions that motivated our examination of our database management needs and goals. Once we had identified our needs and goals, we turned our efforts toward the development of an effective database design and the best possible implementation of this design. We chose a client-server technology as a solution, because it proved to be the most flexible and efficient method of managing the data. Our particular client-server system is described here in detail. Since other sites may have fewer computing resources, we also give a brief example of a smaller scale clientserver system, for those who would like to take advantage of this new technology. Data from a large-scale study of HIV-infected individuals had been collected at the Walter Reed Army Medical Center. The study included repeated psychological, neuropsychological, medical, pharmacological, and radiological evaluations at 6-month intervals over the course of several years (Martin et al., 1992). The data had been collected by a number of investigators from different departments, who used different types of computers and stored the data in different formats. Whenever larger data sets containing data from several investigators were analyzed, a tedious and time-consuming process of merging existing data sets had to be performed. When new data were added to the individual study data sets, the merging
198
A CLIENT-SERVER RESEARCH DATABASE process had to be repeated. This cumbersome organization of the data prompted a search for a database management system (DBMS) that could be run on the available computer hardware and facilitate the process of updating or querying the data. An exhaustive examination of our database management needs and goals resulted in the following list of requirements. First, although data sets actually reside on several computers, they should appear to the user as one large, unified database. Second, the user interface software should take full advantage of the local computing power in providing an intuitive, graphically oriented interface to the database. Third, a user should be able to generate either an in-depth report on an individual or a spreadsheet of selected measures from a group of individuals. The DBMS should have the ability to accommodate an expanding database in terms of additional computers and additional sites. Although shared data should be easily accessible, safeguards regarding patient confidentiality and data integrity must be implemented. Data integrity includes frequent backups and a database design that prevents any redundancy in the data. Any interaction with the database should minimize the data flow over communications lines. By minimizing the data flow between the user and the database, queries and updates can run over existing phone lines and, whenever they are available, high-speed lines can accommodate many simultaneous users. Spreadsheet programs and flat file managers were clearly inadequate for our needs. Each subject record had to contain all of the possible variables. Whenever a variable was added to the database it had to be present in all of the records even if only a few subjects actually had data for that variable. A relational database management system (RDBMS) could, however, meet all of our needs and goals. In an RDBMS, data are stored in related tables (also called entities) that have rows and columns (also called properties). The tables form relationships by using variables called keys. This type of database design has been shown to enhance data storage and retrieval by reducing the processing of redundant data and allowing the addition of vast amounts of new data without modifying the original tables. The IBM DATABASE 2 (IBM, 1992a) was chosen for our particular database because it was easily accessible and well supported at our site. However, the relational database design presented here can be implemented in most fully functional RDBMSs, including those that run on personal computers.
Database Design and Implementation
Although new tables can be added to the DB2 database, the goal was to create a relational database design that would accommodate virtually any type of biomedical study. The design stage involved an in-depth analysis of all of the data generated by the HIV studies and the types of queries commonly made of that database. Date's (1984) guide for the creation of a relational database design was used; it yielded a relational database design consisting of 10 tables. Each subject had one or more rows of data
199
within a table. The data for each subject were located in columns within their respective rows. The 10 tables were designed to accommodate a multitude of measures, including biochemical assay results and psychological profiles. Figure I graphically presents the tables and their relationship to each other. Each measure (stored in the Score _ Master table) was linked to tables describing the measure in more detail (Score i.Detail, Variable), the subject's current state (Pat i.Admin, Pat i.Hist, Diagnosis, Occupation), and the conditions under which the measures were obtained (Location, Study, Drug). The design accommodated multiple testing sessions. It also enabled the efficient creation of smaller data sets for subsequent analysis, as well as in-depth reports of individual subjects. The design underwent rigorous normalization to avoid redundant data, which can often lead to ambiguities in the database. A more detailed description of the database design, including a description of the keys and other components of the tables, is beyond the scope of this paper. Once the relational database design was completed, the tables were created by using Structured Query Language (SQL) statements submitted to the DB2 RDBMS. SQL consists of a set of commands issued to the RDBMS to create and manage the database, to access the database, and to grant access to the database. SQL has become an industry standard, and it can be used with most multiuser RDBMSs. The tables were created by using IBM 3270 terminal emulation running on a desktop computer. All of the original data sets from the studies were then transferred to the mainframe and loaded into the appropriate tables with utilities that were developed by the DB2 group at our site. Access to the tables was managed by the data-
Diagnosis
G
e--[ P'I~Hlst
l
Score_Master
t
Docopatl••
-B
Score_Detail ~
Drug
Variable
Figure 1. Database design, including the 10 tables and their links.
200
LALONDE, BISHOP, MARTON, AND MARTIN
base administrator, who implemented views of the database. Individuals or groupsof individuals are givencertain views of the data that restrict their access to the selected variables. Although the databasecouldbe accessed by usingmM's QueryManagement Facility (IBM, 1992b),this user interface required some time and effort for one to master it. Moreover, this initial system did not take full advantage of the desktop computer's power in providing an intuitive user interface and could not accommodate a database stored anywhere other than on the mainframe. Therefore, a client-server technology solution was adopted in an attempt to meet all of these goals.
The Client -Server Solution OracleCard(Oracle, 1992) was chosenas the clientsoftware primarily because it ran on both Macintoshand mM PC computers. Furthermore, other Oracle software products, which were needed to connect the client computer to a DB2 database, were provided and supported by the mainframe computing facility. In particular, these Oracle software products included SQL*Connect to DB2, a program that runs on the mainframe and translates the Oracle SQL commands into DB2 SQL commands. (Although efforts have been made to fully standardize SQL, minor differences between vendors must be changed by productslike SQL*Connect.) Anotherprogram, SQL*Net (Oracle, 1991), resides on both the client and the server computers. SQL*Net is an Oracle network interface program that enables an Oracle tool (e.g., SQL*Connect to DB2, OracleCard) running on one computer to interact with ORACLE databases that reside on different computers. The SQL*Net product is required on both the client and the server. Another layer of software programs was particular to this site. Sincethe site's networkused TransmissionControl Protocol/Internet Protocol (TCP/IP) as its network protocol, programs such as MacTCP, PCTCP, and SNSTCPwere neededso that the variouscomputerscould communicate with each other. Figure 2 shows the flow of data through the layers of programs located on client and server machines. Although the layers of programs may give the impression that such a configurationis complex and would require additional support, the opposite is true. The layers take care of differences between computers and provide a more homogeneous computing environment for the development andmaintenance of a clientserver database. The client software, OracleCard, uses a HyperCardlike environment consisting of cards, stacks, fields, and buttons. Optimized for Windowsand Macintosh, OracleCard is fully portable and runs in both environmentswithout manual conversions. Based largely on our relational database design, the clientapplication includes three cards, each developed to accomplish a specific task. First, the Individual Recordcard (Figure 3) displaysand allows the update of information related to specific individuals. Information suchas demographic data, medical history, test scores, and the site where the subjectis currently
IBM 3090 Mainframe (MVS)
OS/2 Server
IBM DB2
Oracle Database Server
Oracle Sal' Connect to DB2
Oracle Sal' Net
Oracle Sal' Net TCP/IP
PCTCP
~
Interlink SNSTCP
..
Ethernet
t
t
Oracle Sal' Net Oracle Card PC Client
Mac Client
Figure 2. Configuration or clients and servers, Including software.
being seen are included on this card. Hospital information systems typically requirethe user to navigate through many screens before obtaining specific information on an individualpatient. In contrast, the IndividualRecordcard was designed to allow the user to have all of the subject's data available within one card. The card fits on a portrait size screen, but it allowsa user with a smaller screen to scroll vertically across it. Scrolling fields and pop-upmenusprovide a truly interactive and intuitive user interface. Second, the Search card allows the user to select multiple criteria in order to generate a subset of the data, which can then be saved on separate Individual Record cards or as a tab-delimited text file on the client. This feature is particularly useful for small-scale statistical analyses of a subset of the data. The search procedure is simplified by dynamically changing the available choices, depending on the user's previousselection in anotherfield. For instance, if a user has restricted his/her query to a group of individuals of a particular gender, he/she would only be given the correct alternatives and codes for the query. Queries are automatically saved as separate cards and can be deleted at any time. Experienced users have the option of directly writing their queries into the large field. Again, strategic use of pop-up menus and scrolling fields allows large amounts of information to be effectively presented and managed. Third, the Security Card is used to provide safeguards against breaches in confidentiality and unauthorized access. When given Authoring privileges by the local database administrator, the more experienced user can move the location of fields and buttons in order to customize
A CLIENT-SERVER RESEARCH DATABASE
2:53:54 PM 10/31193
IndMdual Record
e
~
HI
~ New Subject IFenxwe Subjectl I Start Again I End Session I I
log On
log Out
"brTe
Options
Search
I
Research Location: IWalter Reed Patient Id; 10000019 Screen Date:
I~,~99~0/~9==/'===,---J
Last
IBrown
First
M.I.
IJohn
.8.II.lIw.1
Ir.1 -::-23:;-:4:-F:::-irs":":t:-S;::"tr-ee-:"t----------------, IPoolesville 1'~'L.:12~08:.:3.:....7_ _-.J1 Country of Origin D
~
Caregiver: First Name
IJohn Contact Phone:13015551234 Referring Physician: First Name IChristian
Last Name
IBrown
Sr
I Home
Phone:1301555432'
last Name IIBarnard
Demographics
Date Of Birth:!, 950/1 2/25 EdUCatlOn:I'~1:-2.~00
~
Occupation: ILtlKNOWN Current D!agnOS';-Is''''':-::HlccV.:....+'-------l Evaluatlon Date:111/01/1990 Vision Correction: ~
Gender:~1
Ethnic Group: Handedness: Vision Status: Marital Status: Stage Of Disease:
, 0 2 ,
Figure 3. Upper portion of the Individual Record card.
the cards to suit his/her own needs. Since security measures are far better developed for server software than for client software, it was decided that most of the security tasks should be given to the database server. Consequently, no amount of tinkering at the client level could violate security measures.
Discussion Client-server technology is based on the ability of a client (e.g., a PC, a Mac, or a UNIX workstation) to support and control the user interface, thereby allowing the server (another PC, Mac, UNIX workstation, mainframe) to dedicate its resources to DBMS functions. The client and server communicate using an industry-wide set of operators such as Structured Query Language (SQL). The advantages of a DBMS based on client-server technology include (1) the ability of programs to interact with other programs and data on various platforms, (2) improved performance by distributing the processing load across clients and servers, and (3) cost savings by matching hardware specifications to system requirements. For example, with a minimal investment, a small site may purchase relatively inexpensive client and server software and benefit from all of the RDBMS's features. Some of the disadvantages of an RDBMS based on client-server technology include the need to manage a more complex environment.
201
DB2 enables centralized technical support, maximal processing speed, and data integrity. Since the costs associated with support and maintenance of the DB2 DBMS are shared by numerous users at the site, they compare favorably with other database servers. An additional benefit of using the mainframe as a server was its ability to perform large-scale statistical analyses of data from different locations. The SAS/ACCESS (SAS, 1989) interface to DB2 and Base SAS software (SAS, 1992) were used for large-scale statistical analyses. Smaller facilities may still take full advantage of clientserver technology. Prices are rapidly decreasing as performance and ease of use increase. A low-end clientserver system could include Microsoft's SQL Server for Windows NT (Microsoft, 1993b) and client software such as FoxPro (Microsoft, 1993a). A system based on these server and client programs could start at less than $10,000. Given the existing base of IBM PCs and compatibles, client-server systems are rapidly becoming feasible for many sites. The DBMS described here may serve as a model for other systems. Client-server technology provides powerful DBMS tools and distributes processing, storage, and costs to all the users. Ultimately, many DBMSs like this one could be interconnected to create large research networks. These networks would further enhance the exchange of information, promote the establishment of standards, and allow large-scale studies to be performed more efficiently. REFERENCES DATE, c. J. (1984). A Guide toDB2. Reading, MA: Addison-Wesley. ENNS, J. T., OcHS,E. P., '" RENSINK, R. A. (1990). VSearch: Macintosh software for experiments in visual search. Behavior Research Methods. Instruments. & Computers, 22, 118-122. HAXBY, J. V., PARASURAMAN, R., LAWNDE, F., ",ABBOUD, H. (1993). SuperLab: General-purpose Macintosh software for human experimental psychology and psychological testing. Behavior Research Methods, Instruments, & Computers, 25, 400-405. IBM CORPORATION (1992a). IBM DATABASE 2 (Computer program). San Jose, CA: Author. IBM CORPORATION (1992b). Query Management Facility: Version 3 (Computer program). San Jose, CA: Author. MARTIN, A., HEYES, M. P., SALAZAR, A. M., KAMPEN, D. L., WtLLIAMS, J., LAW, W. A., COATS, M. E., '" MARKEY, S. P. (1992). Progressive slowingof reaction time and increasingcerebrospinal fluid concentrations of quinolinic acid in HIV-infected individuals. Journal of Neuropsychiatry & Clinical Neurosciences, 4, 270-279. MICROSOFT CORPORATION (1993a). FoxPro Version 2.5 (Computer Program). Redmond, WA: Author. MICROSOFT CORPORATION (l993b). Microsoft's SQL Server for Windows NT (Computer program). Redmond, WA: Author. ORACLE CORPORATION (1991). SQL*Net TCPIIP (Computer program). Redwood City, CA: Author. ORACLE CORPORATION (1992). OracleCard Version 1.1 (Computer program). Redwood City, CA: Author. PSYCHOLOGY SOFTWARE TOOLS (1990). MEL: Micro Experimental Lab (Computer program). Pittsburg, PA: Author. SAS INSTITUTE INCORPORATED (1989). SASIACCESS Interface to DB2 (Computer program). Cary, NC: Author. SAS INSTITUTE INCORPORATED (1992). Base SAS software (Computer program). Cary, NC: Author.