Data Processing Strategies: Hardware and Software ...

3 downloads 118401 Views 46KB Size Report
Jul 3, 1992 - Statistics Canada processed the 1975 Census on that ... Apple Computer, which remains a major player in the microcomputer marketplace.
Paper presented at the NIDI/IUSSP Expert Meeting on Demographic Software and Micro-computing, Strategies for the Future, The Hague, 29 June - 3 July 1992.

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating Mathieu Pageau, Patrick Gerland, and Nancy McGirr * Introduction "Demography has always had a strong quantitative component and was one of the first social sciences to make use of early data processing equipment" (Strong, 1987, p. 183). The first major application of computer technology in the population field was the processing of population censuses. This paper begins by providing an overview to the evolution of computer and microcomputer technology showing that recent advances, especially with microcomputers, have virtually eliminated any bottlenecks to census and survey processing as a result of hardware deficiency or lack of access. This is followed by a brief history of the evolution in microcomputer software, especially as it pertains to census and survey processing. Later we focus specifically on software for data entry, editing, and tabulation. Brief History of Computer Hardware Computing history is closely associated with census data processing. The U.S. Census of 1890 was the stimulus for the invention and introduction of Hollerith's electro-mechanical data processing and sorting devices using punch cards: the first major statistical machines to be built and put into large-scale use. The time needed to process the census data was reduced from seven and half years in 1880 to two and a half years in 1890, despite an increase of thirteen million people in the interval decade. Although improvements were made to punch card processing through the World War II period, the machines did little more than manipulate the vast quantities of punched cards and were limited in speed, size, and versatility. The real beginning of the modern computer era starts with two machines: the ENIAC (Electronic Numerical Integrator and Calculator) and the EDVAC (Electronic Discrete Variable Automatic Computer). These machines made use of the "stored program" concept of storing both instructions and data in the computer's memory. It represented the shift from the mechanical -electromechanical devices that used wheels, gears, and relays for computing to devices that depended upon electronic parts such as vacuum tubes and circuitry for operations. The ENIAC was a huge machine; its 18,000 vacuum tubes took a space eight feet high and eighty feet long. It weighted thirty tons and used 174,000 watts of power.

*

Computer Software and Support for Population Activities Project. United Nations Department of Economic and Social Development. The views expressed in this paper are those of the authors and do not nessarily reflect those of the United Nations. Mention of firm names or products does not imply the endorsement of the United Nations.

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

Improvements in computer capabilities are grouped in generations based upon the electronic technology available at the time. The refinement of computer technology focuses on increased speed, reduced size and power requirements, and lower costs. • The first generation of computers, based upon the designs of the ENIAC and EDVAC, began with the production and sale of the first computer designed for commercial use: the UNIVAC. First-generation computers (1951-58) relied on vacuum tubes for power and used magnetic drums for memory. Punch cards were used for input and output, processing speed was slow and memory limited, overheating and maintenance problems were frequent. Instructions were coded in machine language until symbolic languages (mnemonic symbols representing instruction) were invented. • Second-generation computers (1959-64) were characterized by increased memory capacity (magnetic core memory) and transistors for internal operations and power. Memory was supplemented by storage on magnetic tapes and disks. Processing speed and reliability increased with important reductions in size and heat generation. More sophisticated, English-like languages such as COBOL and FORTRAN were commonly used. Batchprocessing operating systems had come into use, though they were aimed principally at maximizing the productivity of large and expensive computers rather than aiding the efforts of individual users. • The invention of the integrated circuit on silicon chips for internal operations led to the third computer generation (1965-70). These machines were even smaller, faster, and more powerful than computers in early generations. They were also less costly, more reliable, and used less electricity. In those days shifting to a new computer normally implied having to abandon or rewrite existing applications programs because of hardware incompatibities. This situation led IBM in 1964 to introduce a range of compatible machines of varying power and capacity, the System/360 series, to replace their previous distinct types of mutually incompatible scientific and commercial computers. This strategy of having a range of compatible machines was soon followed by other manufacturers. Other developments in this period included minicomputers and the emergence of the software industry. In the early seventies, the cost of an IBM 360 model 30 was about half a million dollars. That computer had 32K RAM and about 10M of disk storage. In the mid 1970s, the most powerful computer was the IBM 370 model 165 with 1M of RAM and removable disks, each containing about 10M of data. Statistics Canada processed the 1975 Census on that kind of computer. Benin processed the 1979 Census on an Honeywell/Bull 60 minicomputer with 256K RAM and 20M of disk storage in removable units. That computer cost about $250,000 and was used solely to process the census data. • Fourth generation computers (1971-today) rely on large-scale integration for internal operations and the development of microprocessors, or "computers on a chip". Trends in miniaturization led to an increase in speed, power, and storage capacity and to the introduction of microcomputers and supercomputers. A microcomputer is a computer built around a single chip microprocessor. Less powerful than minicomputers and mainframe computers, microcomputers have nevertheless evolved into very powerful machines capable of complex tasks. Technology is progressing so quickly that state-ofthe-art microcomputers are as powerful as mainframes of only a few years ago. The advances in microcomputer technology are the result of rapid technical progress in the computing and digital

2

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

electronics industries coupled with enough standardization to enable broad-scale popularization, and this progress is expected to continue unabated. Microcomputers first appeared in the late 1970s. One of the first and most popular was the Apple II, introduced in 1977 by Apple Computer. During the late 1970s and early 1980s, new models and competing operating systems were being developed, no standard having yet emerged. In 1981, IBM entered the fray with its first microcomputer, known as the IBM PC. This machine was configured with 16K RAM (expandable to 256K) and two 360K floppy disk drives and cost about $5,000. The IBM PC quickly became the personal computer of choice, and most other early manufacturers fell by the wayside. One of the few companies to survive IBM's onslaught was Apple Computer, which remains a major player in the microcomputer marketplace. Other companies adjusted to IBM's dominance by building clones, computers that were internally almost the same as the IBM PC, but that cost less. The next step in the evolution was the IBM XT which included a 10M hard disk as standard equipment. In just a few short years, faster processors emerged leading to the development of the IBM AT, with 640K RAM and 40M of hard disk storage. Today it is possible to find microcomputers with 8M of RAM and 200M of disk storage for about $2,000. Portable computers running on batteries and weighing about three kilos can have up to 16M of RAM and 100M of hard disk storage. These computers can be bought for about $4,000. See Figure 1 for a brief summary of microcomputer hardware development. By the late 1980s, it was possible to process a national population census with a population less than 10 million persons using microcomputers (Toro and Chamberlain, 1988). For example, the 1885 Census of Burkina Faso was processed using 28 IBM microcomputers of varying sizes and capabilities using Bernoulli cartridges and magnetic tapes for data storage (United Nations, 1989, p. 24). Figure 1. Summary of Microcomputer Hardware Development 1977

Apple II

1981

IBM-PC

1983

IBM-XT

1984

IBM-AT (80286)

1986

Laptops

1987

386-based micros and IBM-PS/2

1988

Notebooks

1990

Palmtops

1991

486-based micros

1992-93

586-based micros

As we can see, microcomputer evolution has occurred in a very short period of time. Concerning the speed of processing, the ENIAC was able to perform about 0.01 million instructions per second (MIPS). The IBM 360 model 30 was performing about 0.05 MIPS; the IBM 370 model 165, about 0.4 MIPS. The CPU that equipped the IBM XT, the Intel 8088, performed about 0.1 MIPS when running at its normal speed. The 80286 processor performed about 0.4 MIPS, and the existing 80386 can process between 0.7 and 3.0 MIPS. The new 80486 has the capacity of processing more than 4.0 MIPS and we are already preparing for the 80586 processors that will perform more than

3

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

7.0 MIPS. New developments include the Reduced Instruction Set Chip (RISC) which can perform even more instructions per second. With the increasing speed of microcomputers today, new operating system software has been developed to make computing more attractive to non data processing people. DOS was and still is intimidating to many users. WINDOWS and the new OS/2 Presentation Manager are more attractive because they use a Graphical User Interface (GUI). Well-designed GUI can free the user from learning complex command languages and make it easier to use various applications since consistency is insured in the design interface for each software. The multiplicity of microcomputer users created a need to share printers and disk space. The computer industry came up with the idea of Local Area Networks (LAN) that link computers together and allow them to share less used resources such as printers and communication cards (IRMA, modem) and frequently used files on disk. Novell is one of the pioneers of Local Area Networks; Microsoft (with LAN Manager) and IBM (with Token Ring) were quick to follow. Most recently developed is the Wide Area Network (WAN) where the network is not restricted to a single room or a single building but may span a whole country, linking the network via a communication line. When using a LAN or WAN, each microcomputer can access data that resides on a file server which passes the data around on a demand basis. The LAN or WAN technology allows for greatly increased power in data processing and greater data integrity: appropriate software in a network or multiuser environment authorizes workstations to have access to a specific data file in order to read it, but only one workstation, if any, has the power to change the contents of the file at any one time. This prevents inconsistent updates to be performed concurrently. More sophisticated systems provide locks against concurrent updating at the record or household level, rather than locking the entire file. With the advent of the LAN technology, and prior to that of smart workstations, cooperative processing came into use. Cooperative processing refers to the process of splitting application work on multiple platforms or on multiple machines, and choosing the best place to execute a specific part of the application. A distributed data base management system (DBMS), in which data are found in multiple locations, uses cooperative processing to gather all the data necessary to satisfy a specific request. This short history about computer hardware has shown that hardware is not a bottleneck any more for processing censuses and surveys in any countries. Overview to Census and Survey Processing "At the outset of the 1970 World Population and Housing Census Programme in 1965 computers were still prohibitively expensive for many countries, and their storage and processing capacities were only a small fraction of what they are today. Nevertheless, the data processing technology used in many developing countries in the 1970 round was unnecessarily primitive; costly data were unedited, undertabulated and destroyed. Census data and survey publications often were seriously delayed since many needless man-years and calendar-months needed to be spent developing and testing tailor-made computer programmes for edit and tabulation capabilities that should have been immediately available from some thoughtfully designed and easily available software packages. During the 1970 programme, nothing like a data edit package was even attempted, and what few tabulation packages there were offered only limited help" (Lackner and Shigematsu, 1977, p. 265).

4

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

In many respects, the processing of census and survey data has remained unchanged. We first collect the data using prepared questionnaires that are filled manually, then proceed to a manual count of the number of cases and persons to obtain preliminary results. The data are coded, and sent for data entry. Data entry is done as fast as possible using operators who automatically enter what they see. Some may use questionnaire photography to capture data, some may use Optical Character Recognition (OCR) devices or scanners, but the idea is to capture the data as fast as possible to produce a preliminary count for the census or survey. In the editing phase, we remove errors and duplications and include omissions using the original questionnaires as a source to update the data file. This is an iterative procedure. We run the editing program to find anomalies, correct them, and rerun the editing program until the data file is clean. After several cycles, we often impute values to remove remaining errors and inconsistencies from the file. Imputation is based on statistical formulas and normally uses the COLD DECK (less often then before) or HOT DECK theories to impute values. When the data file is considered clean, we produce statistical tables to show the results. While relying on specifications from subject-matter specialists, programmers have been needed to write dedicated programs for all phases of data editing, imputation, and tabulation. During the 1970s and 1980s, efforts were made by the Census Bureau, the United Nations Statistical Office, and other organizations to develop and provide generalized editing and tabulation programs. While the generalized programs could be used for more than one application, the data processing staff was heavily involved in writing the specifications for the programs. With the transition to microcomputer based processing, these programs were often downsized or ported to microcomputers, sometimes with the addition of special features to make them somewhat more user-friendly. In addition, new programs were being written specifically for microcomputers to cover these as well as other neglected areas in the data processing loop, notably data entry programs. In the early 1980's, several programs appeared on the market as dedicated intelligent data entry software. See Diskin (1986) for a review of some of these products. In the transition to microcomputer-based processing systems, Data Base Management Systems (DBMS) also have offered considerable possibilities in the data entry and file management areas. DBMS were originally created for the mainframe and later ported or rewritten for mid-range computers and micros. DBMS allow census or survey data to be centralized in databases so that we can better manipulate and exploit the data. A database is a collection of related data. DBMS software allows programs to be written that store and/or retrieve data in the database. Someone first needs to describe the data structure and how it is related to the DBMS. This information, called metadata, is kept in a data dictionary. Depending of the chosen DBMS, the metadata is kept in a separate file or in the same file as the data such as for DBF files. A DBF file contains a varying length header that describes how the data is kept in the file. The newer DBMS software has often included a form design component making them even more attractive as census and survey processing systems. The form design component allows users to draw the questionnaire on screen and use the computer to enter data directly on the form. The program logic writes the data into a specified file format, usually compatible with widely used DBMS so that it is possible to exploit the data immediately using the existing interfaces provided by the DBMS. The main problem with intelligent form design software is that it is very limited in the way the data are processed. The software may allow for range checks but that is about all. One

5

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

reason for the lack of editing capabilities is that the software is designed primarily to draw forms. For more intensive editing, we need to revert to programmers to code programs. In addition, the DBMS provides little flexibility for tabulating and presenting the resulting data without requiring extensive programming to get the data in the desired format. The most recent trend in software development for census and survey data processing has been toward writing integrated generalized software for the data entry, editing, and tabulation phases of censuses and surveys. For example, the U.S. Bureau of the Census has developed the IMPS, Macro System's Institute for Resource Development has developed ISSA, and the United Nations DESD has developed PC-EDIT/XTABLE. Statistics Canada also developed a database oriented system for data editing and imputation called GEIS. See Cotton (1988) for a preliminary assessment of these packages in their early stages of development and implementation. The next sections of this paper will focus on the logic behind the generalized integrated data processing software, especially as it pertains to data entry, editing, and tabulation. Emphasis will be given to the new directions in data processing as a result of continued advances in microcomputer technology and software development. Data Entry and Editing Software In census and survey work, statisticians and demographers have given more attention to questionnaire design, sampling, and data collection than to procedures for data entry, coding, and editing. Until recently, these procedures were usually carried out at a central location, distant in both time and space from the place where the data were collected. With pressures to speed up data processing in order to produce quick results, more errors are likely. With the advent of portable computers, this is changing and data entry can take place at the time data are collected, thus contributing to improved data quality. The timing and type of editing to be done during and after data entry becomes a choice between data quality and speed. The conventional wisdom with respect to this issue is that for population censuses, the volume of data, the time pressure for producing complete national results, and the generally temporary nature of the data entry work force argue strongly to do little significant editing at the time the data are entered and for delaying error detection and correction until after data entry and accomplishing it by automatic methods (imputation of missing or erroneous values). The cost of adopting such a policy is that errors cannot be corrected by an examination of the source questionnaire, with a resulting (invisible) loss of data quality. The reason that it has been possible to pay such a price is that the volume of data is large, the error rate is generally low, and it is believed that current methods of error correction have acceptably limited bias. However, the data processing strategy for large scale data entry should not be applied for smaller scale data collection efforts such as post-enumeration checks, complex sample surveys, multi-round and follow-up surveys, vital and civil registration, and administrative data in general. Data edit rules are fundamental in improving the overall data quality and consistency of the ultimate outputs. Formulating editing rules for computer applications is generally a complex process that requires considerable substantive work. Such a process depends upon experience in subject matter areas such as demography, education and labor markets as well as country and region specific knowledge.

6

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

Interactive environments such as microcomputers offer the possibility of improving the process of refining the set of edit rules and using them directly at the data entry stage, leading to an improvement in final data quality. Intelligent data entry software allows for the development of data entry applications, data entry itself, verification, additional secondary editing, data modification, and the collection of statistics on the data entry operations. With intelligent data entry, only valid values can be entered during data entry. As data are keyed, constants are automatically entered, skips are implemented, range checks occur, auto increments and any consistency checks are performed according to the program logic. For exceptional cases, an option is provided to force a value into a field that does not pass the edit checks. A new development is to include on-line look-up tables internal to the data entry program that can be used to assign codes to written responses on the questionnaire. For example, with PCEDIT/XTABLE value labels are defined using XTABLE. The list of values can be displayed in a window during data entry using PC-EDIT, and a code entered in the data file by selecting the label corresponding to the written response. With intelligent data entry, data are basically clean immediately after data entry is completed. Secondary editing involving complex internal consistency and structure checks that require the review of several sections of the questionnaire and need to be corrected according to detail recommendations can be done interactively by more advanced supervisors. Summary statistics pertaining to the editing process can be produced in batch processing per field and per edit rule. Individual error listings can be used to locate cases in error and modify fields with incorrect values. The new software for survey data entry and editing needs to be simple to use so data processing people need not be involved a lot. (This will not be possible in some countries, but it will at least decrease the time it takes to develop the data entry and data editing programs). It should also provide for either internal or external coding tables so the coding and transcription phase prior to data entry can be eliminated. While this may slow down the data entry operator somewhat it should not be a major problem if data entry operators are trained on the purpose of the survey and the content of the questionnaire. It is a trade off between faster data entry with longer data editing and slower data entry resulting in virtually clean data files. With the new software and powerful microcomputers, it should be possible to accommodate both schools of thought. The total amount of time from the beginning of data entry to the production of clean statistical tables must be considered, not just the amount of time devoted to any one phase of census and survey processing. Field Data Entry One method of minimizing the number of steps involved in data collection is to collect and record the data at the same time the interview occurs. Skips, range, consistency and structure checks can be done interactively "on the spot". Some experimentation has been done with this procedure. For example, data from the Guatemala DHS were collected using keyboard entry with laptop computers and the ISSA software module for forms design and data entry. Overall the experiment was a success, shortening the time required to collect the information, and improving the quality of data due to immediate feedback (Cantor and Rojas, 1991). Palmtops with traditional keyboards provide another interesting cost-effective alternative for small data collection/entry operations. Compared to full-size computers or laptops, palmtops are limited

7

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

but can be practical for specific applications such as small surveys or continuous data collection on a small-scale basis requiring fast processing such a for pilot programs or health and epidemiological community-based sentinel surveys. Palmtops are often DOS compatible machines with 40 column screens. Newly developed low cost and small size storage system RAM and ROM cards of up to 1 Mb in the near future will give palmtops a high degree of expandability and allow for greater data storage capability for data entry than currently exists. (See Matzkin, 1991.) PC-EDIT produced by the UN DESD can be used on either laptops or palmtop computers. PC-EDIT is designed for almost any microcomputer that runs DOS. The system needs only a text mode display screen (at least 40 characters per line) and works with less than 256 KB RAM (at least 64KB required). It requires limited storage space for programs. Layout(s) can be defined on desktop computers and used on laptops, notebooks or palmtops for field data entry using a simplified questionnaire and easy code selection with the new XTABLE interface to define and select labels instead of codes only. The data entry module of the PC-EDIT package is being adapted to work on an Atari palmtop computer with 128Kbof RAM and a 128Kb card of external memory that operates on three AA batteries. The screen display is adapted to the Atari's 40 column/8 line screen. A prompt mode has been added so that each individual question can be displayed on screen (one at a time) to resemble the questionnaire format. Notepad-sized computers with pen-based input device have appeared in the market. Handwriting recognition is still in its early stage of development, far too expensive and fragile for field data collection/entry at this time, but offers an array of future possibilities. See Cantor and Rojas (1991) for more detail. Most encouraging from the data processing point of view, however, is the reduction in the number of steps involved in data entry. Notepad computing with handwriting recognition would allow for screen questionnaires to be filled by pen at the point of interview using intelligent data entry software, eliminating manual field checks, questionnaire transport, office checks and data entry, including much of the consistency editing since most errors would be discovered at the time of interview. Statistical Tabulation Software Tabulation is basically the process of reading the data and aggregating individual values into appropriate cells of a table. Aggregating data is and always will be a job oriented to batch processing. There is no need for operators to watch the machine while it is aggregating data. The machine is reading and using its brain to process the file, unlike in data entry where the machine has to wait between keystrokes. Tabulation processing can be refined and algorithms modified to produce tables faster, but the processing required to design table formats, read the data, and print the results is more difficult to improve. In order to manage the large amounts of data generated by a survey or census, many analysts and researchers have relied on a statistics package or on commercially available data base management software. In addition to their expense, statistical packages require large computer resources (memory, storage space), handle fewer cases, are limited sometimes to processing single volume files, and fall short of multi-dimensional data handling capabilities. Since the main outputs generated for most censuses and surveys are frequency distributions and cross-tabulations,

8

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

dedicated data tabulation software offers most users a more cost-effective alternative to general statistical analysis software. Statistical packages should be reserved primarily to perform more advanced statistical procedures not available in standard cross-tabulation packages. Tabulation packages fall into two main categories. The first type is oriented to producing cameraready tables, with extensive formatting, labeling, and spacing requirements. These packages often require detailed preparation of the table formats themselves, and specifications for the processing that will actually put the data in the individual table cells and perform any additional manipulations such as calculate percentages, means, or medians. Examples of this type of tabulation software used for census and survey processing include: CENTS from the US Bureau of Census; TPL from the US Office of Labor / QQQ Software. These software were downsized from either mainframe and midrange computer for use on microcomputers. The problem with downsized programs is that they are still batch oriented in that table format and syntax processing is done in the same run as the data processing, not interactively with the computer. Because of the special syntax required for describing the format and content of the tables, data processing personnel need to write the code and produce even the tables needed by statisticians and demographers. Many cross-tabulations are produced only to verify data accuracy or to support research. These tables are "working table" that the end-user (statistician or demographer) should be able to produce easily without data processing assistance. The other category of tabulation packages tend to be those that require little detailed formatting requirements, the only requirement being to specify the content of the table cells by listing the variables that will be used and their hierarchical order in the table. In these programs, table formats are often defined interactively and the system will not allow incorrect syntax. No programming assistance is required. The resulting tables are produced with standard general formats. Occasionally there are options for editing labels and titles or changing column widths, but there is little flexibility for improving the appearance of the table format itself. Often, the individual tables can be imported into other software from which the appearance can be improved (spreadsheet, word processing, desk top publishing). Tables produced through statistical packages are good examples of this type of package. Some of the newer tabulation packages developed specially for microcomputers try to bridge the gap between the two types of packages. These are often part of integrated and generalized systems that have been designed specifically for microcomputers: QuickTab, XTABLE, ISSA, etc. Tabulation software needs to be able to produce template tables by reading the data very fast. Processing raw data is one of the most time consuming tasks of a computer. Tabulation packages also should be able to process data that reside on multiple files/volumes. Data files on microcomputers cannot span more than one volume, so it is important to have the capability of processing data from multiple files because of the file space restraints that exist for microcomputers. Most developing countries cannot afford WORM media to store large quantities of data, so they must rely on normal media such as floppy disks, hard disks, and cartridges. Using the microcomputer technology and software advances available to us for intelligent data entry, editing, and tabulation, interactive data analysis becomes possible as well (Weatherby et al, 1985). At any point in the data entry or editing phase, it is possible to produce preliminary results to check data quality. For any area or batch of data, it should be possible to produce frequencies and cross-tabulations using the raw data for preliminary results, as well as for recoded data and index construction (if the latter items have been built into the data entry and editing program). Results for

9

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

each geographic unit only need to be summed for each administrative area of the country. Using this approach, preliminary results can be obtained as fast, if not faster, than with the more traditional approach of separate data entry, coding, editing, imputation, and tabulation phases of census and survey processing. Once aggregated results have been prepared from the tabulation software, further statistical and demographic analysis can take place. A wide variety of software packages exist that make use of aggregated data for further analysis. The software ranges from spreadsheet software to elaborate programs to calculate demographic indices and projections. These end products are often the reason why the data were collected in the first place. To the extent that we can use microcomputer technology and software advances to speed up and improve the collection and processing of data to obtain timely and accurate information for decision-making, we should. Concluding Remarks Given recent advances in microcomputer technology and software development for census and survey data processing, it would seem that neither hardware nor software will pose problems for the collection and rapid processing of data. Intelligent selection of hardware and software to accomplish the tasks at hand is a given. The particular hardware configuration, software selection, and data processing strategy chosen to collect and process data will often depend on the volume, detail, and complexity of the information. Despite the tremendous advances, however, computers and software cannot do all the work. Highly qualified and trained personnel are needed to perform the tasks of questionnaire construction and data processing. Increasingly, these personnel will not necessarily be data processing specialists. Rather demographers and statisticians may be taking on more of the responsibilities, especially for smaller scale efforts. Recent developments also suggest that there is no substitute for careful design of the questionnaire to ensure not only that it obtains the required information, but also that it can be easily and properly processed using the advances in intelligent data entry and editing software. As technology dictates that enumerators also will be entering and editing data at the point of collection, greater training needs to be devoted to these skills as well as general interviewing skills. Careful training will have to be given to data entry operators as they increasingly take on the jobs of coding and editing as a result of advances in intelligent data entry software. Microcomputer software requires greater control and checking of the way the data entry process is organized. File management, back-up, communications and security will assume increasing importance. References

Byte (1992). The Future of Pen Computing. (Roundtable). Part 1: March 1992, Vol. 17, No. 3, pp.: 115-118; Part 2: April 1992, Vol. 17, No. 4, pp.:99-102. Cantor, D. and Rojas, G. (1991). Future Prospects for Survey Processing: Technologies for Data Collection. DHS World Conference Proceedings Washington, D.C. August 1991, Vol. 2, pp. 1357-1371. Cushing Jeanne, (1991). DHS Data Processing Strategy: Advantages and Disadvantages. DHS World Conference Proceedings. Washington, D.C. August 1991, Vol. 2, pp. 1329-1335.

10

Data Processing Strategies: Hardware and Software for Data Entry Editing and Tabulating

Cotton, P. (1988). A Comparison of Software for Editing Survey and Census Data. Symposium `88: The Impact of High Technology on Survey Taking, Ottawa, October, 1988. Diskin, B. (1986). An Evaluation of Microcomputer Data Entry Software Packages for Use in Developing Countries for Census and Survey Processing. Computer Software Branch, International Statistical Programs Center, U.S. Bureau of the Census, Washington. Eames, Charles and Ray, (1990). A Computer Perspective (Background to the Computer Age). Harvard University Press, Cambridge, MA and London England. New edition, 1990, 176 p. Ferry Benoit et Cantrelle Pierre, (1989). Les Expériences Nouvelles en Matière de Collecte Intégrée de Données sur Micro-ordinateurs Portatifs. IUSSP. International Population Conference 1989. New Delhi, 20-27 sept. 1989. pp. 47-62. Lackner, M. and Shigematsu, T. (1977). Some Statistical Data Processing Software for Small Computers. Bull. Int. Stat. Inst. XVLVII (1): 265-276. Matzkin, Jonathan. (1991) Palmtop PCs: Power by the Ounce. PC Magazine, Volume 10 (13): 197-244. Runyan Linda, (1991). 40 years on the frontier. Datamation, Vol. 37, No. 6, March 15, 1991; p34 (13) Strong, M. (1987). Software for Demographic Research. Population Index 53 (2): 183-199. Toro, V. and Chamberlain, K. (1988) The Use of Microcomputers for Census Processing in Developing Countries: An Update. Symposium `88: The Impact of High Technology on Survey Taking, Ottawa, October, 1988. United Nations. (1989). The Use of Microcomputers for Census Data Processing. UNFPA/INT88-PO9/1, New York. Weatherby, N., Fenn, T., and Elkins, H. (1985). Microcomputers in Developing Countries: Interactive Data Entry, Editing, and Analysis. Working Paper No. 15, Center for Population and Family Health, Columbia University, New York.

11