Automating Research Data Collection - Wiley Online Library

75 downloads 182253 Views 67KB Size Report
disadvantages of three forms of automated data collec- tion—scannable data ... technical support, and equipment and/or software require- ments. A discussion ...
ACAD EMERG MED

d

November 2004, Vol. 11, No. 11

d

1223

www.aemj.org

Automating Research Data Collection Jason S. Shapiro, MD, Michael J. Bessette, MD, Kevin M. Baumlin, MD, Deborah Fish Ragin, PhD, Lynne D. Richardson, MD Abstract This article reviews the capabilities, advantages, and disadvantages of three forms of automated data collection—scannable data forms, Web-based forms, and handheld computers—compared with the current standard of data entry by hand on paper forms. Each of these methods is reviewed with respect to ease of use, experience required of designer, end-user training requirements, costs, flexibility, speed, accuracy/error rate, potential for data loss, need for

technical support, and equipment and/or software requirements. A discussion of their appropriate application to various kinds of studies is included, followed by examples of research studies using each of these methods. Key words: medical informatics applications; data collection; handheld computers; Internet; surveys; questionnaires. ACADEMIC EMERGENCY MEDICINE 2004; 11:1223–1228.

Data collection is the backbone of most research. It often takes on a life of its own, providing either success or failure to a clinical research project. Even with rigorous data protocols and management, some factors are beyond the investigator’s control. Errors due to transcription and lost data-collection forms may go unrecognized or uncounted. Human error in manual data entry is a concern, as is the potential for technical problems or equipment failure in automated methods of data collection. The ideal data-collection method would be inexpensive, easy to use, and applicable to widely varying types of studies. It would allow end-users who have received minimal training and who do not have a significant prior level of technical proficiency or education to use the method to enter data efficiently and in a consistent way that minimizes input errors and loss of data. Finally, the ideal method should be able to make use of relatively inexpensive equipment, or equipment that is already in use in a department, such as computer workstations or handheld computers. Although no single method is currently available that meets all of these requirements, researchers should be aware of existing and emerging technologies that allow data collection to be conducted in the most accurate and efficient manner possible.

This article reviews the capabilities, advantages, and disadvantages of three forms of automated data collection: scannable data forms (SDFs), Web-based forms (WBFs), and handheld computer (HC)-based forms and compares them with manual data entry using paper forms. Each of these methods is evaluated based on the following criteria: ease of use, experience/ educational level required of designer, end-user training requirements, costs, flexibility, speed, accuracy/ error rate, potential for data loss, need for technical support, and equipment and/or software requirements. Brief examples of studies that have used each of these methods for data collection are given at the end of each section to illustrate their utility in actual research situations.

From the Department of Emergency Medicine, Mount Sinai School of Medicine (JSS, MJB, KMB, DFR, LDR), New York, NY; Los Robles Regional Medical Center (MJB), Thousand Oaks, CA; and the Department of Psychology, Montclair State University (DFR), Montclair, NJ. The Emergency Medicine Patients’ Access to Healthcare (EMPATH) Study was supported by Grant No. 38668 from the Robert Wood Johnson Foundation. Address for correspondence and reprints: Jason Shapiro, MD, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, Box 1620, New York, NY 10029. Fax: 212-426-1946; e-mail: [email protected]. doi:10.1197/j.aem.2004.08.017

SCANNABLE DATA FORMS Scannable data forms, commonly referred to as ‘‘bubble forms,’’ are documents that can be read by a computer scanner in order to extract the data. They are paper-based and have been used for decades in standardized testing (e.g., SAT I and SAT II). Increasingly, other entities are using these forms as well. Professional organizations sometimes use customized versions of these forms as election ballots. The end results can be a quick tally of votes (in the case of elections), a score (in the case of standardized tests), or a database coded according to predetermined specifications (in the case of clinical research1). Although relatively easy to use for respondents familiar with them, SDFs generally require greater sophistication than traditional paper instruments that are designed to be entered manually. If the datacollection instrument is an interview or structured chart abstraction, appropriate selection and training of staff should minimize errors. However, if the form is a self-administered questionnaire, SDFs may not be

1224

Shapiro et al.

appropriate in populations such as emergency department (ED) patients and visitors, many of whom may not be familiar with such forms. The process of creating a coded database with SDFs requires some advanced planning and initial expense. Before scanning the coded SDFs, a file is developed that will allow the computer to input the scanned data from the forms in the appropriate fields in the database (Table 1). The end product can be a database written in American Standard Code for Information Interchange (ASCII), SPSS (SPSS Inc., Chicago, IL), or other statistical packages. If affordable, the work and expense may be worthwhile when compared with the hours of data entry and correction of entry-related errors that may be required if using the current standard method of data entry by hand using paper forms.2 Although there is a decrease in the probability of human error in the data transcription and entry phase using SDFs, there is no evidence to suggest a difference in the speed or accuracy of data capture by the respondent or enduser filling out the form. This is a question that could be answered by future investigation. These advantages are balanced by the lack of flexibility of the forms. SDFs require careful planning to fit the form within the constraints of available space. Furthermore, once the forms are formatted and printed, any changes to the forms require reprinting and normally incur additional charges. Scannable forms, like traditional paper forms, consist of loose, losable sheets of paper and so offer no advantage with respect to missing forms, space for data storage, or confidentiality issues.

Scannable data forms come in two styles: standard (referred to as general-purpose) and customized. The standard-version SDF is a structured page divided into columns of unequal widths. The larger space is reserved for the form questions and the remaining space is reserved for the precoded responses. SDFs use color-coded, horizontal bars to create a lined-page effect that assists the reader in linking the question to its correct set of precoded responses. These precoded responses are designed to accommodate questions that yield dichotomous, multiple-choice, or Likerttype responses. The response options are indicated by small ovals or circles. After reading the question, interviewers (or respondents) indicate the appropriate response by blackening the correct oval or circle with a no. 2 lead pencil. There are some notable constraints when using the standard forms.3 Items that entail lengthy directions, skip sequences, and long questions may consume considerable space on the form. This will reduce the number of questions that can be placed on a single page, thereby increasing the number of data forms (pages) needed for the study. The more data forms used per encounter, the greater the cost incurred. This cost is compounded because not only will the initial cost of the forms be greater, but also the amount of time needed to complete the scanning process similarly will increase, raising the costs of scanning. The customized SDF forms allow the investigator to configure the questions and the page in a manner that enhances readability and facilitates the smooth delivery of the questionnaire. When using customized

d

AUTOMATING RESEARCH DATA COLLECTION

TABLE 1. Comparison of Data Collection Fields

Design and implementation features Designer needs no technical expertise Low instrument cost Instrument flexibility Little/no technical support No equipment or software required Low distribution costs End-user features Little/no advance training required No computer skills required Speed of completion Data issues Accuracy: no data entry errors Low potential for data loss No manual data entry costs No data processing costs No extended secure storage of data forms Multi-center trials: no shipping/faxing forms Special features Tracks response times Easily incorporates complex skip patterns Easily incorporates extensive instructions and user support Prevents incorrect responses Prevents skipped questions Images, videos easily incorporated

Standard Paper Forms

Scannable Data Forms

Web-Based Forms

Handheld Computer Forms

1 1 Ø 1 1 Ø

1 Ø Ø Ø 1 Ø

Ø Ø 1 Ø Ø 1

Ø Ø 1 Ø Ø 1

1 1 Ø

Ø 1 Ø

1 Ø 1

Ø Ø 1

Ø Ø Ø Ø Ø Ø

1 Ø 1 Ø Ø Ø

1 1 1 1 1 1

1 1 1 1 1 1

Ø Ø Ø Ø Ø Ø

Ø Ø Ø Ø Ø Ø

1 1 1 1 1 1

1 1 1 1 1 1

ACAD EMERG MED

d

November 2004, Vol. 11, No. 11

d

www.aemj.org

forms, the response codes can be placed above, below, or to the left of the question, if desired. The flexible placement of the response codes gives the investigator more freedom to create a visibly appealing questionnaire, one that facilitates smooth delivery and minimizes confusion on the part of the interviewer or respondent. The ability to design a form to the specific space and configuration needs of a particular study also helps to reduce the number of pages needed for the complete form. Thus, while it may cost more at the outset, the customized forms can result in net savings if they reduce the total number of pages of the interview and thus the amount of time needed for scanning. This is particularly true when large numbers of forms are being scanned. SDF Case Study. The Emergency Medicine Patients’ Access to Healthcare (EMPATH) Study, a 28-hospital multicenter trial, used data collection in the form of patient interviews and chart reviews for all ED patients at the participating hospitals in a single 24-hour period in October 2002.4,5 SDFs were designed for both the patient interview and the chart extraction to eliminate errors normally associated with data entry by hand. The complex survey questions contained in the EMPATH questionnaire required a customized SDF design. With careful planning and construction, a patient interview and a chart review form were created on one double-sided page each. Although there were initial setup and design costs for the customized SDF, standard scannable forms for this project would have required several additional pages to fit the entire survey instrument, increasing overall costs due to printing, shipping, and scanning expenses.

WEB-BASED FORMS Most people are at least familiar if not comfortable with the WBF. It is used to shop, pay bills, and register for myriad online services. For individuals who are familiar with using a computer, training requirements should be minimal; in fact, WBFs may be easier and faster to use than traditional paper forms. On the other hand, WBFs may be unsuitable for persons who are uncomfortable using a computer and, of course, Web-based surveys exclude respondents who do not have access to a computer. Depending on the demands of a particular research project, designing and supporting a WBF may require a significant amount of expertise in Web site and database design and administration. Many departments and hospitals have personnel working for them who have experience in these areas. Those without institutional support can hire outside firms to create a WBF with a backend database for rates that vary with geographical location, degree of complexity of

1225 the project, and amount of time needed for design. The WBF can be stored on a local Web server at an institution, or can be outsourced to a commercial Web service provider at minimal cost. Web-based forms have a tremendous amount of flexibility and are easily modified, they will accept a variety of inputs, and the data can be placed directly into the study database. WBFs can be designed to prevent incorrect entries and incomplete responses. Complex skip sequences can be accommodated, and the site can measure automatically the time it takes the subject to respond to your questions. Images, audio, video, and pop-up windows can be used to guide the end-user to fill out the WBF completely and accurately. Although some of these multimedia tools are available to the HCs discussed in the next section, none of them are available to the SDFs or to standard paper forms. WBFs can also be used to log and track the number of unique respondents, and collect demographic data on them through a simple registration process. In use, WBFs have been shown to be as reliable as paper forms and to have fewer data entry errors.6 WBFs can enhance the efficiency and quality of multicenter studies7 because data can be entered continuously without any time or geographical restraints. There are many applications available for creating WBFs, including some free Web-based applications for WBF generation that require minimal or no programming experience.8–10 There are also a number of commercially available software packages for development of WBFs and their backend databases. Because the data are entered directly into the database at the time the end-user collects them, data entry and processing costs are eliminated, and there is significantly less potential for data loss from lost paper (standard paper forms or SDFs), or even lost or broken devices that are temporarily storing parts of the data set between synchronizations by the end-user (HCs). All of the data that are collected using a WBF reside on a server and database; this significantly simplifies data storage issues compared with keeping boxes of paper forms secure. If patient-sensitive data are being transmitted, the WBF can be set up with various levels of security and encryption to ensure data security. Another option is to have the WBF placed on the hospital or department’s intranet, and then require respondents to have a secure login either on-site or through a virtual private network client that many hospitals now have in place. This last measure should ensure compliance with the Health Insurance Portability and Accountability Act (HIPAA) regulations. The database that stores the data after collection and entry is no different with any of the methods discussed here, and all should have a system in place for adequate confidentiality protection and regular backup in case of a system failure.

1226

Shapiro et al.

This data-collection method lends itself extremely well to remotely collected surveys targeting individuals who have access to a computer, because of the ease with which the survey can be distributed and with which responses and response rates can be tracked. The WBF can be posted on a Web site, and the universal record locator (URL) or ’’link’’ can be e-mailed directly to study participants, or the form itself can be sent as an e-mail. It may be a little more difficult to use than other methods for bedside data collection in clinical studies, because WBFs require the respondent to be at a computer with an Internet connection. Although some departments now have mobile wireless computers, this equipment is relatively expensive when compared with stationary personal computers (PCs), and has a greater potential for loss or theft in a busy ED.

allowed multicenter surveys to be conducted even with limited financial resources.14 Compared with WBFs, HCs offer the obvious advantage of portability. Data can be collected anywhere and then synchronized to a database running on a single workstation in a department’s office, or transmitted back to a central database via the Internet when used in multicenter trials with sites that are geographically separated. The forms can be created with a large degree of complexity using multiple pathways for navigation between forms. Additional features such as alarms, reminders, or time-stamps for data collection can be inserted into the forms to prompt the user to enter data at certain times, or track the time it takes the user to complete. Other features such as in-line images and the ability to have reference material linked to the forms for end-user support are also available with HCs. Like WBFs, the HCs can be easily modified in an adaptive way that allows the tool to be developed and changed during the planning phase of a study. Handheld computers have been shown to produce data that are more complete,15 are faster and have fewer errors,16 and are more consistent17 than other forms of data collection. One of the problems with HC use is battery life. Data collection may be restricted to a few hours with most Pocket PC models, or to one day with most Palm OS models. Another problem is the screen size. Some study participants or data collectors, particularly those aged 40 years or older, may find it difficult to view text and images on a 2 3 2-inch screen.18,19 Handheld computers may require more technical knowledge to deploy and maintain than WBFs or SDFs. There are several options for setting up an HCbased data-collection tool. The first is to customdesign an application in a programming language such as JAVA (Sun Microsystems, Santa Clara, CA), C11 (Microsoft), or Visual Basic (Microsoft). This method requires extensive programming experience and may require significant development time for the tool, but this option may offer the most flexibility in the tool’s design, allowing it to be created exactly as the investigator wishes. Another option is to use one of the many commercially available HC-based database tools, many of which are able to synchronize with a backend database such as Access. A final option is to use an HC form development platform that creates forms for the HC and creates tables for the temporary storage of data on the HC that will then synchronize with a backend database through a conduit. Some of these HC-forms tools can be set up with no programming, but for significant customization and to increase the flexibility in designing the datacollection tool, some programming may be necessary. Two form-development platforms for HCs with which the authors have experience are reviewed here: Pendragon Forms 4.0 (Pendragon Software Corp., Libertyville, IL) and Satellite Forms 5.2 (Pumatech

WBF Case Study. The information technology (IT) survey of academic EDs in the United States was a WBF survey that was sent to all U.S. emergency medicine residency–affiliated EDs in Fall 2002. The WBF was created in Cold Fusion (Macromedia, San Francisco, CA). The URL or link to the WBF was e-mailed to all program directors who were asked to fill out the survey. Once the data were entered by the program directors on the WBF, Cold Fusion was used to populate an Access database (Microsoft, Redmond, WA). Microsoft Excel was used for data management and tabulation, and SAS System 6.02 (SAS Institute, Cary, NC) was used for statistical analysis. The response rate from the initial e-mail was approximately 40%; subsequent e-mail and telephone followup rendered an overall response rate of 77%. The results were published in Academic Emergency Medicine in August 2003.11

HC-BASED FORMS According to a report by W.R. Hambrecht & Co., in 2001, an estimated 15% of physicians in the United States used PDAs.12 Recent data provided by handheld drug information provider ePocrates showed that they had 271,000 physicians including 17,000 emergency physicians registered to use their product, suggesting that a large percentage of emergency physicians own HCs (Laura Kaufman, personal communication, Oct 7, 2003). Because of the widespread use of HCs among health care professionals, these endusers should require only a minimum of training if they will be performing the data entry, but this method may not be well suited for use in a survey where the device may be used by patients or family members. Handheld computers have been used for data collection in clinical trials extensively for the last 16 years; these trials range from small, single-center to very large, multicenter trials.13 Like WBFs, HCs eliminate data entry and processing costs and have

d

AUTOMATING RESEARCH DATA COLLECTION

ACAD EMERG MED

d

November 2004, Vol. 11, No. 11

d

1227

www.aemj.org

Inc., San Jose, CA). Both of these programs allow the end-user to synchronize data from his or her HC through a direct connection to a local computer or over the Internet to a server housing the database program. If patient-sensitive data are being transmitted, some security features are built into these products, but synchronization to a local computer would ensure HIPAA compliance. Both programs are available through their respective company Web sites; costs vary depending on the number of HCs used, the data transmission method, and the features purchased from the respective company or third-party software companies. Pendragon Forms 4.0 is an easy-to-use forms generator built around an Access database. The Forms Manager component allows users to choose from any of 21 different field types in creating these forms. Some scripting can be done to custom-program certain tasks, such as imbedded calculations or skipsequence questions. The data are transferred from the HC directly to a table in Access on the PC through a conduit that is built into the software and requires no further programming. Although this software is relatively easy to use and often requires no programming, there are some drawbacks, such as character limitations in various fields or the inability of this version of the software to include images in the forms that may be needed in some studies. Satellite Forms 5.2 is a more powerful design platform, but it may require significantly more technical skill. There are a number of ‘‘controls’’ that are similar to the fields in the Pendragon software. These controls allow the user to place various items on forms such as text boxes, buttons, drop-down lists, and color or black-and-white bitmap images. The menus for these controls have various action properties that can be turned on that determine what happens when an end-user interacts with the form (i.e., pushes a button with the stylus). There is also the capability for extensive scripting to custom-program additional actions if the desired action is not available on the control’s menu. If the user wants to do something that is not covered by the included controls, such as beaming a form to a printer, custom controls can be programmed using C11, and there are many third-party add-on controls that can be purchased and downloaded from the Internet. One potential difficulty with using this software is that a conduit needs to be created for each project, and this may become quite complex when large projects are undertaken. An advantage, though, is that users may set up synchronization with multiple backend databases through an open database connectivity (ODBC) driver in Windows. This feature allows the forms to synchronize with more powerful server-based backend databases such as those created in SQL Server (Microsoft) or Oracle (Oracle Corp., Redwood Shores, CA).

HC Case Study. A structured observational study of prescription errors as detected by pharmacy personnel in a community drug store was created in Pendragon Forms 4.0. Data were collected by a trained research assistant using an HC form with limited multiple-choice answers to short queries. This form was created by an emergency physician with no formal programming experience, and the research assistant was able to use the form successfully after 1.5 hours of training. The data were collected on three HCs and synchronized directly to a desktop computer using the cradle of the HC. There was no repetition of data or missed queries in the study. Publication of these data is still pending.

CONCLUSIONS Scannable data forms, Web-based forms, and handheld computers can all facilitate data collection and entry in research studies. Each of these methods holds significant and distinct advantages over the current standard of using paper forms with manual data entry. Each automated tool has particular requirements, constraints, and advantages. Investigators must choose carefully depending on the population to be studied, the design of the study, the data to be collected, and the financial and human resources available. Whereas WBFs may be best suited for remote surveys, HCs or SDFs may be better suited for bedside data collection. Departments with the internal or institutional expertise to design WBFs or HC data-collection tools may find these solutions to be considerably less expensive than departments that must pay vendors for this service. More systematic and objective comparative investigation of the costs and data accuracy of each of these methods is needed to assist researchers in choosing the optimal data-collection method. With continuing innovations in technology and informatics there is also the possibility of developing new methods of automated data collection that would allow investigators to create data-collection tools that can be easily scaled and customized for various types of studies. Researchers should avail themselves of existing and emerging technologies that allow data collection to be conducted in a more rapid and efficient manner than with traditional paper forms. References 1. Borque LB, Fiedler EP. How to Conduct Self-Administered and Mail Surveys. (2nd ed.). Thousand Oaks, CA: Sage, 2003. 2. Fink A. How to Manage, Analyze and Interpret Survey Data. (2nd ed.). Thousand Oaks, CA: Sage, 2003. 3. National Computer Services Pearson Web site: http:// www.pearsonncs.com/surveytracker/. Accessed Mar 20, 2004. 4. Richardson LD, Ragin DF, Hwang U, et al. Emergency Medicine Patients’ Access to Healthcare (EMPATH) study: reasons for seeking care in the emergency department [abstract]. Acad Emerg Med. 2003; 10:514.

1228

Shapiro et al.

5. Richardson LD, Ragin DF, Hwang U, et al. Emergency Medicine Patients’ Access to Healthcare (EMPATH) Study: racial/ethnic, gender and age related differences in emergency department use [abstract]. Acad Emerg Med. 2003; 10:524. 6. Pettit FA. A comparison of World-Wide Web and paper-andpencil personality questionnaires. Behav Res Methods Instrum Comput. 2002; 34:50–4. 7. Kublickas M, Bringman S, Westgren M. The Internet: a good web for gathering data for clinical trials and registries. Traditional paper data forms are soon completely ‘‘out.’’ Lakartidningen. 2003; 100:322–7. 8. Birnbaum MH. SurveyWiz and factorWiz: JavaScript Web pages that make HTML forms for research on the Internet. Behav Res Methods Instrum Comput. 2000; 32:339–46. 9. Programs to make HTML forms for research on the Web. Available at: http://psych.fullerton.edu/mbirnbaum/ programs/. Accessed Mar 23, 2004. 10. WWW Survey Assistant Website: http://www. surveyassistant.com/. Accessed Mar 23, 2004. 11. Pallin D, Lahman M, Baumlin K. Information technology in emergency medicine residency-affiliated emergency departments. Acad Emerg Med. 2003; 10:848–52.

12. Rae-Dupree J. New hand-held computers get second life with new features. U.S. News. 2001; 130:48–50. 13. Koop A, Mo¨sges R. The use of handheld computers in clinical trials. Control Clin Trials. 2002; 23:469–80. 14. Giammattei FP. Implementing a total joint registry using personal digital assistants. A proof of concept. Orthop Nurs. 2003; 22:284–8. 15. Reilly JC, Wallace M, Campbell MM. Tracking pharmacists’ interventions with a handheld computer. Am J Health Syst Pharm. 2001; 58:158–61. 16. Lal SO, Smith FW, Davis JP, et al. Palm computer demonstrates a fast and accurate means of burn data collection. J Burn Care Rehabil. 2000; 21:559–61. 17. Gupta PC. Survey of sociodemographic characteristics of tobacco use among 99,598 individuals in Bombay, India using handheld computers. Tob Control. 1996; 5:114–20. 18. Shapiro J, Bessette M, Levine SR, Baumlin KB. HandiStroke: a handheld tool for the emergent evaluation of acute stroke patients. Acad Emerg Med. 2003; 10:1325–8. 19. Lampe AJ, Weiler JM. Data capture from the sponsors’ and investigators’ perspectives: balancing quality, speed and cost. Drug Inf J. 1998; 32:871–86.

d

AUTOMATING RESEARCH DATA COLLECTION