Potentials and Pitfalls of Integrating Data From Diverse Sources ...

2 downloads 71 Views 249KB Size Report
database. Considerable effort and care were required while designing and ..... consult resources on designing databases, otherwise (e.g., Hernandez 1997; ...
Potentials and Pitfalls of Integrating Data From Diverse Sources: Lessons from a Historical Database for Great Lakes Stream Fishes There is considerable enthusiasm for, and value in, the development and analysis of large databases that integrate physical and biological data from diverse sources and over broad spatial, temporal, and taxonomic scales. There also are special challenges. We introduce the Biological Impacts of Low-head Dams (BILD) historical database, developed for assessing the impacts of small barriers used in the control of parasitic sea lamprey (Petromyzon marinus) on assemblages of stream fishes throughout the Great Lakes drainage basin. We also highlight challenges encountered in developing the database. Considerable effort and care were required while designing and analyzing the database because of variability in the information the contributing agencies had collected, as well as how the information had been collected, organized and stored. Furthermore, only a small portion of the data is suitable for addressing the impacts of small barriers on stream fish assemblages. We therefore provide general recommendations for developing databases integrating data from diverse sources and provide cautions about expectations for them. Our overview is intended to assist fisheries scientists, managers, and funding agencies asked either to develop an historical database, or to provide data or funding for one.

Robert L. McLaughlin Leon M. Carl Trevor Middel Marlene Ross David L. G. Noakes Daniel B. Hayes Jeffrey R. Baylis Robert L. McLaughlin is a research associate in the Department of Zoology and the Institute of Ichthyology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1, 519/824-4120 x3544, [email protected]. Leon M. Carl is a research scientist for the Ontario Ministry of Natural Resources, Aquatic Ecosystems Research Section, Peterborough, Ontario, Canada, 705/755-2288, [email protected]. Trevor Middel is a biologist for the Ontario Ministry of Natural Resources, Aquatic Ecosystems Research Section, Peterborough, Ontario, Canada, 705/755-1553, [email protected]. Marlene Ross is a M.S. candidate in the Department of Zoology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1, 519/824-4120 x6096, [email protected]. David L. G. Noakes is a professor in the Department of Zoology and the Institute of Ichthyology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1, 519/824-4120 x2747. Daniel B. Hayes is an associate professor in the Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA. 48824-1222, 517/432-3781, [email protected]. Jeffrey R. Baylis is an associate professor in the Department of Zoology, University of Wisconsin, Madison, Wisconsin, USA 53706-14990, 608/263-5134, [email protected].

14

There is considerable enthusiasm for the development of databases bringing together scientific information from diverse sources. The enthusiasm has arisen for at least three reasons. First, scientists and resource managers recognize that databases are valuable for making scientifically-defensible decisions regarding fish stocks and their environment. Second, database approaches can increase the integrity and consistency of the data, encourage data sharing among researchers with different areas of expertise, and facilitate the transfer of data among different application programs used for analysis (Harvey and Press 1996). Third, developments in statistics, such as meta-analysis, are improving greatly our ability to summarize what has been done, to examine questions at broader spatial, temporal, and taxonomic scales, and to plan future research (e.g., Osenberg et al. 1999). These developments mean that scientists and resource managers whose primary training may emphasize skills in fisheries management, fish ecology, environmental issues, and policy making, will be asked more frequently to be involved in the construction or management of large databases, or to contribute data or funds to them. On one hand, this is desirable because fisheries scientists are likely to be “closer” to the data and methods of collection, and therefore can improve the quality of the design and analysis of the database (Van Alstyne et al. 1995). On the other hand, this can be problematic if the scientists are unfamiliar with database design and management. Indeed, in some Fisheries | www.fisheries.org | vol 26 no 7

areas of biology, scientists collecting large amounts of data may have to turn them over to outsiders specializing in the design and analysis of large databases (Reichhardt 1999). While collaboration between scientists and experienced database developers is preferred (e.g., Campbell 1999), it is not always possible or pursued. Furthermore, developing databases is a complex task despite software advances encouraging greater involvement by individuals inexperienced with database development. Such issues necessitate greater discussion of the potentials and pitfalls of databases in fisheries research. Other disciplines have sought to address these issues to varying degrees (e.g., Gross et al. 1995; NRC 1995; Harvey and Press 1996), but it remains unclear whether their conclusions are widely known or transferable across disciplines because of differences in research objectives, project size, funding, and spatial and temporal scope. We therefore highlight challenges we encountered while developing and analyzing the Biological Impacts of Low-head Dams (BILD) Historical Database. First we describe the database and what it was developed to do. Then we describe the challenges encountered during its development and analysis, and provide recommendations for others who may undertake projects integrating data from diverse sources. Our intention is to assist other scientists, managers, and funding agencies who may be asked to be involved in projects of a similar nature.

The BILD Historical Database and Rationale The BILD historical database was developed as part of a basin-wide project to assess the impact that low-head barriers (0.4–2.0 m in height; see photos) have on assemblages of stream fishes found throughout the Great Lakes drainage basin (see www.axelfish.uoguelph.ca/research/BILD.htm for more information). The Great Lakes Fishery

Commission (GLFC) is considering expanded use of low-head barriers as an alternative method of controlling parasitic sea lamprey (Petromyzon marinus). During the early 1900s, sea lamprey invaded the upper Great Lakes through shipping passages and were responsible, in part, for population crashes of large fishes such as lake trout (Salvelinus namaycush). Since 1958, sea lampreys in the Great Lakes have been controlled by periodic treatment of rearing streams with the larval lampricide, 3-trifluoromethyl-4-nitrophenol (TFM). In 1992, the GLFC pledged to reduce its reliance on TFM by 50% because of public concern regarding the use of chemicals (GLFC 1992). Low-head barriers represent one alternative method of control being considered. These barriers deny adult sea lamprey access to their spawning grounds in streams, thereby restricting TFM treatment to the section of stream below the barrier, reducing the amount of TFM used and the number of nontarget fishes exposed to the chemical. Unfortunately, the impact of these small barriers on other taxa of stream fishes has not been assessed. The historical database was developed to complement an extensive field survey conducted in the summer of 1996 which examined the species composition of 24 matched pairs of barrier (with barrier) and reference (without barrier) streams across the Great Lakes basin (Hayes et al. MS, Porto et al. 1999). Strengths of the field survey include its paired design and standardized sampling protocol. A weakness of the field survey is the absence of time-series data, because the magnitude of any impacts could be time dependent (e.g., Tilman et al. 1994). Given that considerable time would be required to carry out a proper beforeafter-control-impact paired series design (e.g., Bence et al. 1996), we hoped that the historical data would provide useful time series to augment our field survey. The historical database combines data from five agencies: Department of Fisheries and Oceans Canada (Sea Lamprey Control Centre), Michigan

Figure 1. Design of the BILD historical database indicating the data tables, relationships between tables, and fields (variables) used as keys in the relationships. July 2001 | www.fisheries.org | Fisheries

15

An older design of lowhead barrier on Days River, Michigan (top) and a more recent design with fishway on Cobourg Brook, Ontario (below).

16

Department of Natural Resources, Ontario Ministry of Natural Resources (Ontario Fish Information System and Natural Resource Information Branch), Wisconsin Department of Natural Resources (Wisconsin Fish Distribution Survey Database), and U.S. Fish and Wildlife

Service (Sea Lamprey Management Program). The database was developed using Microsoft Access and a schematic of the design is provided in Figure 1. The database contains information on four general topics: (1) the locations and physical flow characteristics of sample streams throughout that portion of the Great Lakes drainage basin where sea lamprey are likely to breed, (2) the location and specifications for any barriers present, (3) the locations and timing of specific samples collected from each stream, and (4) the composition and structure of stream fish assemblages sampled from those sites, along with details of sampling activity and effort (Table 1). The information on stream fishes includes more than 26,000 sample surveys for 184 streams across the Great Lakes basin (Figure 2). Surveys for some streams extend back to the early 1900s, but most were conducted in the 1980s (Figure 3). Fisheries | www.fisheries.org | vol 26 no 7

The Challenges Database Planning The first significant challenge we encountered was to draft a project plan where we (1) identified the specific goals and objectives of the database, (2) identified who the end users (beyond ourselves) likely would be, (3) identified potential sources of data, (4) selected the software package with which to implement the database, and (5) laid out the schedule for the project’s completion. Although the need for planning is a truism, inadequate planning or failure to adhere to a plan can waste time and resources and lead to an inferior database (Michael 1991; NRC 1995; Harvey and Press 1996). Our plan reduced the open-endedness of the data compilation process, which could have been a formidable problem with an area the size of the Great Lakes drainage basin and with a large variety of data sources. For example, our plan helped us avoid spending extra time acquiring data that were

not directly relevant to the impacts of small barriers on stream fishes. In addition, it also led us to focus on the main federal, state, and provincial agencies as sources of data because they could provide large volumes of data with the least effort. Sources for smaller datasets, such as Ontario conservation authorities, journal publications, and graduate theses, were not pursued because of the added effort required to obtain fewer data and because of the increased possibility of data duplication if the sources had shared their data with federal, state, or provincial agencies. When contacting the agencies, our plan helped us clearly articulate the type of data we were looking for and how we intended to use it. Our plan also exposed the uncertainty surrounding the time needed to complete the database project. We had relatively limited resources and did not know how much data were available. In addition, we could not use methods designed for estimating the extent of data from literature searches (Harvey and Press 1996). We ambitiously

Table 1. Synopses of information contained in tables comprising the Biological Impacts of Low-head Barrier Dams (BILD) Historical Database.

1.

Abundance • estimated population size for an area • type of estimate (Leslie, Zippin, Carl and Strub) • standard error of the population estimate • total biomass in area sampled • population density • biomass density

2.

Activity • year and season of sampling • gear used (e.g., electrofisher, net, electric weir, lamprey trap, weir, etc.) • source of data • organization responsible for data collection • type of data available (presence, abundance, mass, length) • year of sample relative to year of barrier construction

3.

4.

5.

Barrier • number of barriers • natural or fabricated • distance to confluence or stream mouth • geographical references (latitude/longitude, UTM coordinates) • date of last construction • discharge at barrier • presence of lamprey traps and jumping pools Effort • number of samples on a given day (single vs. multi-pass electrofishing) • shocking time • area sampled Sample • collection method (type of electrofisher, seine, net, poison, etc.) • date • geographic references (latitude/longitude, UTM coordinates, county)

July 2001 | www.fisheries.org | Fisheries

• • • • • •

description of water flow water and air temperature weather at collection time sample area original source of data sampling problems

6.

Sample Summary (for electric weir and lamprey trap data) • start and stop days • period of operation • number and types of traps operated

7.

Sites • geographical references as above • location relative to a barrier

8.

Species Catch • species code • number, weight, and length of fish caught • biomass • mark/recapture information

9. Species Summary (for electric weir and lamprey trap data) • sum of catch by species and weight over duration of sampling 10. Stream • Great Lake to which the stream is a tributary • geographical references for the stream mouth (as above) • presence and type of barrier • spring and annual discharge 11. Watershed • watershed code and name • map of watershed, if available

17

Figure 2. Geographical distribution of streams included in the BILD historical database.

planned to complete the project in six months. It actually took 17, and some decisions are still outstanding (see Database Administration). Finally, our plan also helped us begin designing the database with end users in mind. One key step involved specifying the information we required to test whether low-head barriers affect stream fish assemblages, including the the species and numbers present at a site, the location of the site relative to any barrier present, and the timing of the sample relative to the barrier construction. Another key step was ensuring that we retained information on factors that might confound the interpretation of our analyses, such as differences in sampling method and effort. We recognized that differences between agencies were likely and that their careful consideration would be important during the analysis, and therefore design, stages. An additional step involved deciding whether the data would be consolidated into one database or dispersed among several linked databases (e.g., Beard et al. 1998; Hale et al. 1998). We opted for the former because of limits on the resources and expertise available and because this option was considered more practical based on the small number of expected end users. A final step involved selecting the database management system. In general, this decision requires careful consideration given the number of database packages available, their strengths and weaknesses, and continual software developments. As a rule, it is key that the management system be adequate to handle the amount and types of data collected and to meet the project goals both immediately and into the future. Other important considerations include the cost of the system, its availability to potential end users, the ease and flexibility of data entry, database implementation,

18

18

and data retrieval, and its suitability for programming any corresponding applications (Connolly and Begg 1999).

Database Design and Integrity The second, and perhaps greatest, challenge we faced was accommodating the differences inherent among the various data sets. This is a common problem for databases integrating data from diverse sources (NRC 1995; Beard et al. 1998). The differences existed at a variety of levels. One was the different forms and formats of the information we were provided. Data from the Michigan Department of Natural Resources, for example, had to be entered manually from original data collection sheets, while data from the other agencies were provided electronically. Manual entry is more time consuming and error prone than electronic processing, but easier for backchecking specific entries. Electronic files acquired from various agencies also differed in the software used to create them and translation of these files was not always as seamless as one might expect. Lastly, we had to accommodate differences in how contributing agencies had customized their databases. For instance, the various agencies used different codes to identify fish species and none of the individual coding systems was adequate to accommodate all of the species present in the combined datasets. We therefore devised our own coding system. In addition, the source databases differed in how data were formatted (e.g., dates), classified (e.g., gear types, weather), and reported (e.g., measurement units, geographical referencing systems). They also varied in the thoroughness of documentation (metadata) (Gross et al. 1995; NRC 1995) provided. Some potentially useful information, such as electroshocker settings and numbers of netters, was not available. There also were inherent differences among the surveys contained within each of the agency databases. The Wisconsin Fish Distribution Survey, for example, included collections made from over 50 sources, such as Fish Distribution Survey personnel, university scientists and students, power corporations, and commercial and recreational fishers. Even within sampling programs, such as Sea Lamprey Control, the sampling protocols have Fisheries | www.fisheries.org | vol 26 no 7

changed over time. The individual sample surveys therefore varied in their design and original purpose, the expertise of the personnel carrying them out, the field collection methods employed, and the information reported (e.g., spatial references, sampling effort, abundances and lengths of species, and the precision of taxonomic identifications). Such inherent differences affect the quality of the data, where quality is rigorously defined as “the totality of features and characteristics of a product or service that bears on its abilities to meet the stated or implied needs and expectation of the user” (ANSI/ASQC 1994). For our purposes, we distinguish between the quality of the data at the time we received it (primary quality) and the quality of the data after we incorporated it into the BILD Historical Database (secondary quality). In terms of primary quality, we assumed that data received from contributors were free of errors. Although this is not recommended (e.g., NRC 1995), it was necessary given the resources available and considered reasonable given that the data already had passed the agencies’ own quality control procedures. We took six general steps to ensure the secondary quality of the data and the integrity of the database. First we verified, record by record, all imported and manually-entered data with the original datasets. Second, we took extra effort to ensure that fields (columns) possessing the same names in different source datasets contained the same information in the same format, and that fields with different names but containing the same data were unified. Accepting the structure of inherited databases at face value is not recommended (Hernandez 1997) and the potential problems (poor design, insufficient data integrity) that can arise are compounded when combining datasets from multiple sources. Third, we included information available regarding the different collecting agencies and dataset sources, as well as factors affecting the efficiency of their sample surveys, such as sampling method, effort, conditions, problems, etc. Fourth, we spent considerable time improving geographical references for streams and sample sites on streams. Fifth, we had graduate students and upper-level undergraduate students, July 2001 | www.fisheries.org | Fisheries

some trained in computer sciences and others trained in ichthyology, test the database. Finally, we solicited feedback from control experts and potential end users at two workshops.

Database Administration A final challenge of the development process concerns administration of the database. Administration includes how the information is disseminated to users, who grants access to users, and who maintains the database, its documentation, and its security. Ideally, these issues should be addressed at the beginning of the project to ensure adequate mechanisms and resources are available for administrative tasks following development of the database. Responsibility and potential costs of administering the BILD database rest with the GLFC, however, some issues remain unresolved in part because the GLFC has no formal policy on the maintenance and administration of databases created by research projects it funds. One of the easier administrative tasks is deciding the best format in which to make the database available to potential users. If the database is large and updated regularly, then online dissemination may be favored, particularly if broad data sharing over the Internet is desirable. For distributed databases this option may be technically challenging (e.g., Beard et al. 1998). If the database is relatively small and static, then compact disks or Zip disks may be suitable alternatives. For the time

Figure 3. Numbers of streams sampled per year during the 1900s.

19

Figure 4. Frequency distribution depicting the number of streams that were sampled a given number of years.

20

being, the BILD database is available from the GLFC on Zip disk, although it may be made available on the GLFC website at a later date (G. Christie, GLFC, pers. comm.). Probably the most onerous administrative task is determining who owns the database or, at least, who will be responsible for granting access to it. The management of intellectual property in digital environments is an area of much ethical and legal uncertainty and contention in both Canada and the United States (e.g., Fishbein 1991; Reichhardt 1998; Fortier et al. 1999). Issues related to ownership and access are most likely to arise if the project involves multiple collaborators from different institutions, is funded by a granting council, and includes contributions of data from multiple sources (Hilgartner and Brandt-Rauf 1994; Harvey and Press 1996). The BILD database is no exception. Indeed, database administrators from some of the contributing agencies we contacted had concerns about allowing us access to their data. There are no easy solutions here, but we make three recommendations. First, potential data contributors need to weigh their enthusiasm to manage and analyze their data further against the time and resources available for doing so and against the broader interest served by creating a larger database for analysis by a larger community of users. Second, database developers need to be clear about project objectives and to acknowledge more strongly the contributions made by individual sources. Third, to avoid conflict after the fact, database administrators need to adhere to the project objectives and, when changes are needed, communicate these changes to the contrib-

utors. Furthermore, if the database is made available to a wide set of users, contributors should be included in this set. To what extent the BILD database will be updated is unclear. While there is considerable momentum within the scientific community to keep databases “alive” (Reichhardt 1999), reflection regarding the BILD database is worthwhile. Some databases are developed and used for synoptic purposes only rather than ongoing research or assessment (Michael 1991). In addition, the decision to keep the database alive should depend on useful criteria, such as the quality of metadata (documentation) regarding the component surveys; the rarity, time length, and analyzability of the data; and the scale across which sites have been sampled and their relocatability (Gross et al. 1995). In this respect, it is reasonable for the GLFC to consider the results of the BILD project and the prospects for its barrier program before committing to a longterm database project. Also, should the GLFC barrier program proceed, any future research or assessment data may be better stored in a separate, linked databases (e.g., Beard et al. 1998; Hale et al. 1998) for reasons of logistics and ownership (Van Alstyne et al. 1995).

Database Analysis Due to variation in the quality of the data and the evenness of sampling, analysis and interpretation of the historical data also has been challenging. The limitations of the data are considerable and five limitations are pertinent to any temporal analyses of fish assemblages in general. First, despite the large volume of data, suitable times series are not available for most streams. Indeed, for 35% of the streams, only one sample survey was available and only 38% were surveyed in more than four years (Figure 4). Second, there were also gaps in many of the time series; 51 of the 119 streams sampled more than once had at least one interval between successive samples of 10 years or more. Comparisons over longer time periods are more challenging to interpret because of the many changes that have occurred in the Great Lakes during the last cenFisheries | www.fisheries.org | vol 26 no 7

tury. Third, spatial and seasonal variation among while there are statistical methods for accommodating collections made on the same stream in different heterogeneity in data quality due to among-survey years adds to the year-to-year variation observed, differences in methodology and design (e.g., because of habitat differences and seasonal move- Osenberg et al. 1999) and differences in sampling ments of fishes, and can impede detection of effort (Fisher et al. 1943), these methods cannot consistent temporal changes. Fourth, the efficiency compensate for the general paucity of adequate of the sample surveys differed considerably among time series. Furthermore, these meta-analyses are streams, owing to the different survey methods and more time consuming because to avoid unconscious effort employed by the various agencies. It also var- biases it is recommended that criteria developed for ied over time, owing to temporal changes in survey the selection of data be tested empirically rather methods within an agency. Finally, the comparabil- than accepted a priori (e.g., Englund et al. 1999). ity of surveys is also affected negatively by For those streams where reasonable times series among-survey differences in the precision of were available (regardless of sample location relataxonomic identifications, particularly for non- tive to the barrier), we are now examining whether there were consistent changes in the game fishes. There are further limitations affecting the suit- species composition following construction of a ability of these data for specifically examining barrier. These changes will be compared with any changes in fish assemblages following construction temporal changes observed in streams without of a barrier. Fifty-nine of the streams had man-made barriers, which could reflect impacts due to facbarriers on them, but historical records for pre- and post-construc- Table 2. General recommendations for creating a database integrating data from diverse sources. tion periods were available for only five streams from our exten1. Project Planning • establish whether a database approach is appropriate (Barquin et al. 1997) sive field survey: four • specify project objectives and potential end users barrier streams and one • identify potential sources of data and the systems used for data storage reference stream. An • estimate the volume and types of data to be handled (e.g., Harvey and Press 1996) additional 17 barrier • consider alternative database designs (consolidated vs. distributed, synoptic vs. perpetual entry) (e.g., streams and 17 reference Harvey and Press 1996; Connolly and Begg 1999) streams had some histori• schedule the project cal data, but with the • estimate resource requirements and costs, including software and hardware, and maintenance and former the time series administration after the database is developed (e.g., Harvey and Press 1996; Connolly and Begg 1999) were restricted to the period following barrier 2. Data Acquisition construction. In addition, • consult resources on integrating data from diverse sources (e.g., Cooper and Hedges 1994; NRC 1995) • acquire data and corresponding documentation (metadata) only 15 of the 73 barrier (man-made and natural) 3. Database Design and Implementation streams had geographical • collaborate with an experienced database developer, if possible references precise enough • consult resources on designing databases, otherwise (e.g., Hernandez 1997; Connolly and Begg 1999) to distinguish surveys • review project objectives and resource requirements made below versus above • design the database in light of project objectives and avoid adopting example or inherited designs the barrier location, and (Hernandez 1997) only 3 had surveys con• establish procedures to ensure data quality and database integrity (e.g., Harvey and Press 1996; ducted above the barrier, Hernandez 1997) the fragmented habitat • implement the database design using selected software and hardware • test the design (e.g., Connolly and Begg 1999) and revise accordingly where impacts such as • develop the corresponding metadata (see Gross et al. 1995) local extinctions are most likely to occur. 4. Database Administration Because of these limi• identify a database administrator, if not done in 1 tations, testing whether • develop policies on database administration and data sharing, if not done in 1 low-head barriers impact stream fish assemblages is 5. Data Analysis proving to be much more • revisit project objectives difficult with the histori• consult general guidelines for interrogating databases (Harvey and Press 1996) cal data than with the • consult resources on statistical analysis of heterogeneous data (e.g., Milliken and Johnson 1984; Osenberg et al. 1999) data from our field survey • conduct analyses as per project plan where assemblages were • test preconceptions regarding criteria for data selection (Englund et al. 1999) sampled using a standardized protocol. In addition, July 2001 | www.fisheries.org | Fisheries

21

tors other than barriers, such as alterations in land use or varying water levels in the Great Lakes. The changes also will be compared to differences observed in the species composition of barrier and reference streams in analyses of our field survey data. In addition, the time series are being used to estimate probabilities of local extinction and colonization for individual species in barrier and reference streams (e.g., Clark and Rosenzweig 1994).

Conclusion Considerable enthusiasm remains for the BILD historical database. At this time, it represents the best historical information available for detecting changes in the composition of stream fish assemblages following the construction of low-head lamprey barriers. At the very least, evaluation of the limitations and uncertainties of the data should assist with assessment protocols developed for any future barrier construction projects. More broadly, there are plans to use the database to develop fish faunal regions for the Great Lakes basin as part of a project testing the GLFC’s Interim Policy on Barrier Siting. In addition, there are plans to use the weir and trap data to identify fishes that may require consideration for passage at low-head lamprey barriers. We expect the same enthusiasm exists for other projects to create databases integrating diverse sources of information. Such enthusiasm needs to be balanced by greater recognition and appreciation of what such a project entails and the potential limitations of the product. Two cautions, in particular, are worth bearing in mind. The first caution is that integrating data from diverse sources into a database is a conceptually sophisticated, resource-intensive task (Batra and Sein 1994; NRC 1995; Harvey and Press 1996), an issue that can get overlooked in the current

Acknowledgments We thank Gavin Christie, Randy Eshenroder, and Chris Goddard (Great Lakes Fishery Commission) for their assistance at various stages during this project’s development; the Department of Fisheries and Oceans Canada (Sea Lamprey Control Centre), Michigan Department of Natural Resources, Ontario Ministry of Natural Resources, Wisconsin Department of Natural Resources, and U.S. Fish and Wildlife (Service Sea Lamprey Management Program) for

22

atmosphere of enthusiasm. This caution, along with the challenges identified above, may be obvious to experienced database developers; however, software advances encourage greater participation by individuals who are inexperienced with database design, and novices are more likely to make significant errors than experts (e.g., Batra and Davis 1992). Moreover, current database software cannot guarantee a welldeveloped database anymore than current wordprocessing software can guarantee a wellwritten essay. Accordingly, general recommendations for integrating and analyzing data from diverse sources are provided in Table 2 along with references where more specific recommendations can be found for each step. The second caution is that compiling volumes of related information does not necessarily translate into a large amount of data suitable for answering a specific research question, owing to issues of data quality, evenness in sampling, and hence suitability for analysis. We recognize that historical databases can have an important role in fisheries research and management (e.g., Moyle 1997; Kaiser 1999). We also recognize that there have been statistical developments facilitating the synthesis of data and findings from independent studies. But, in addition, we stress the need for careful consideration of any suggestion that compilation of existing data will provide a suitable, less expensive alternative to a new, properly designed study. Databases promise to become an increasingly important part of fisheries research and management (Schnute and Richards 1994). Greater discussion of the potentials and pitfalls of databases integrating data from diverse sources is therefore in the interest of project proposers, granting agencies, and end users. It will ensure that such projects are employed effectively, planned carefully, and completed successfully.

granting us access to their data; and Ellie Koon (U.S. Fish and Wildlife Service, Sea Lamprey Management Program), Bob Randall (Department of Fisheries and Oceans Canada), and two anonymous reviewers for their efforts to improve earlier drafts of the manuscript. This project was supported financially by the Great Lakes Fishery Commission. Copies of the BILD historical database can be obtained from: G. Christie, Great Lakes Fishery Commission, 2100 Commonwealth Blvd., Suite 209, Ann Arbor, Michigan, USA 48105-1563.

Fisheries | www.fisheries.org | vol 26 no 7

References ANSI/ASQC (American National Standards Institute/American Society for Quality Control). 1994. American National Standard. Specifications and guidelines for quality systems for environmental data collection and environmental technology programs. ANSI/ASQC E41994. American Society for Quality Control, Milwaukee, WI. Barquin, R. C., A. Paller, and H. Edelstein. 1997. Ten mistakes to avoid for data warehousing managers. Pages 145-156 in R. C. Barquin and H. A. Edelstein, eds. Planning and designing the data warehouse. Prentice Hall, Upper Saddle River, NJ. Batra, D., and J. G. Davis. 1992. Conceptual data modelling in database design: similarities and differences between expert and novice designers. International Journal of Man-Machine Studies 37:83-101. Batra, D., and M. K. Sein. 1994. Improving conceptual database design through feedback. International Journal of Human-Computer Studies 40:653-676. Beard, T. D. Jr., D. Austen, S. J. Brady, M. E. Costello, H. G. Drewes, C. H. Young-Dubovsky, C. H. Flather, T. W. Gengerke, C. Larson, A. J. Loftus, and M. J. Mac. 1998. The multi-state aquatic resources information system. Fisheries 23(5):14-18. Bence, J. R., A. Stewart-Oaten, and S. C. Scroeter. 1996. Estimating the size of an effect from a before-after-control-impact paired series design. Pages 133-149 in R. J. Schmitt and C. W. Osenberg, eds. Detecting ecological impacts: concepts and applications in coastal habitats. Academic Press, New York. Campbell, P. 1999. Don’t leave the biology out of bioinformatics. Nature 401:321. Clark, C. W., and M. L. Rosenzweig. 1994. Extinction and colonization processes: parameter estimates from sporadic surveys. American Naturalist 143:583-596. Connolly, T. M., and C. E. Begg. 1999. Database systems: a practical approach to design, implementation, and management. AddisonWesley, New York. Cooper, H. M., and L. V. Hedges. 1994. The handbook of research synthesis. Russell Sage Foundation, New York. Englund, G., O. Sarnelle, and S. D. Cooper. 1999. The importance of data-selection criteria: meta-analyses of stream predation experiments. Ecology 80:1132-1141. Fishbein, E. A. 1991. Ownership of research data. Academic Medicine 66:129-133. Fisher, R. A., A. S. Corbet, and C. B. Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12:42-58. Fortier, P., D. N. Beaudry, M. Brown, T. A. Brzustowski, R. Douville, J. Levy, R. C. Miller, Jr., J. W. Murray, and C. Simson. 1999. Public investments in university research: reaping the benefits. Report of the Expert Panel on the Commercialization of University Research. Presented to: The Prime Minister’s Advisory Council on Science and Technology. Government of Canada, Ottawa, Ontario. GLFC (Great Lakes Fishery Commission). 1992. Strategic vision of the Great Lakes Fishery Commission for the decade of the 1990’s. Great Lakes Fishery Commission, Ann Arbor, MI. Gross, K., E. Allen, C. Bledsoe, R. Colwell, P. Dayton, M. Dethier, J. Helly, R. Holt, N. Morin, W. Michener, S. T. Pickett, and S. Stafford. 1995. Report of the Committee on the Future of Longterm Ecological Data (FLED). Ecological Society of America, Washington, D. C. Hale, S. S., M. M. Hughes, J. F. Paul, R. C. McAskill, S. A. Rego, D. R. Bender, N. J. Dodge, T. L. Richter, and J. L. Copeland. 1998. Managing scientific data: the EMAP approach. Environmental Monitoring and Assessment 51:429-440. Harvey, C., and J. Press. 1996. Databases in historical research. Theory, methods and applications. St. Martin’s Press, New York. Hayes, D. B., J. R. Baylis, L. M. Carl, H. Dodd, J. Goldstein, R. L. McLaughlin, D. L. G. Noakes, L. M. Porto, and R. G. Randall. Biological impact of low-head lamprey barriers: insights July 2001 | www.fisheries.org | Fisheries

from an extensive survey and intensive process-oriented research. Journal of Great Lakes Research. Hernandez, M. J. 1997. Database design for mere mortals: a hands-on guide to relational database design. Addison-Wesley, New York. Hilgartner, S. and S. I. Brandt-Rauf. 1994. Data access, ownership, and control. Knowledge: Creation, Diffusion, Utilization 15:355-372. Kaiser, J. 1999. Searching museums from your desktop. Science 284:888. Michael, G. Y. 1991. Environmental data bases. Design, implementation, and maintenance. Lewis Publishers, Inc., Chelsea, MI. Milliken, G. A., and D. E. Johnson. 1984. Analysis of messy data. Van Nostrand Reinhold, New York. Moyle, P. B. 1997. The importance of an historical perspective: fish introductions. Fisheries 22(10):14. National Research Council (NRC). 1995. Finding the forest in the trees: the challenge of combining diverse environmental data. National Research Council, National Academy Press, Washington, D.C. Osenberg, C. W., O. Sarnelle, and D. H. Goldberg. 1999. Meta-analysis in ecology: concepts, statistics, and applications. Ecology 80:1103-1104. Porto, L. M., R. L. McLaughlin, and D. L. G. Noakes. 1999. Low-head barrier dams restrict the movements of fishes in two Lake Ontario streams. North American Journal of Fisheries Management 19:10281036. Reichhardt, T. 1998. Alarm in US over database antipiracy bill. Nature 394:410. Reichhardt, T. 1999. It’s sink or swim as a tidal wave of data approaches. Nature 399:517-520. Schnute, J. T., and L. J. Richards. 1994. Stock assessment for the 21st century. Fisheries 19(11):10-16. Tilman, D., R. M. May, C. L. Lehman, and M. A. Nowak. 1994. Habitat destruction and the extinction debt. Nature 371:65-66. Van Alstyne, M. E. Brynjolfsson, and S. Madnick. 1995. Why not one big database? Principles of data ownership. Decision Support Systems 15:267-284.

23

Suggest Documents