The PALICLAS database - CiteSeerX

1 downloads 7759 Views 206KB Size Report
mented using Microsoft Access Version 2.0 database software and currently consists of 38 separate ... C++ language, and (3) it allows us to make extensive use of OLE Custom Controls, which will ... output, and user-interface design. ... and the diatom database developed at University College London, (Munro et al. 1990) ...
P. Guilizzoni and F. Oldfield (Guest Editors) Palaeoenvironmental Analysis of Italian Crater Lake and Adriatic Sediments Mem. Ist. ital. Idrobiol., 55:321-328, 1996

The PALICLAS database Stephen JUGGINS Department of Geography, University of Newcastle, Newcastle upon Tyne NE1 7RU, UK

ABSTRACT PALICLAS is a collaborative research project on the palaeoenvironmental analysis of Italian crater lake and Adriatic sediments. Central to the aims of the project is the need to make direct comparisons between different biological, chemical and physical proxy indicators measured on a range of lacustrine and marine sediments. To ensure compatibility and consistency between these different data types, and to provide an rapid and efficient means of comparing and cross-correlating the different proxy records within and between sites, all data from the project has been stored in a single relational database. The database is implemented using Microsoft Access Version 2.0 database software and currently consists of 38 separate tables, plus stored queries, and contains data on core location, chronology, lithology, geochemistry, pigments, magnetic parameters, carbon and oxygen isotopes, pollen, diatoms, chrysophytes, ostracods, cladocera, chironomids, and foraminifera. In addition to these raw data, derived age-depth models and core-correlation models, the database also contains a number of linked C++ programs to facilitate the display of multiproxy stratigraphic records, and the preparation of data for subsequent statistical analyses. The paper describes the structure of the database and the additional applications software. Key words: palaeoecological database, multi-proxy records, core correlation, age-depth models, stratigraphic data display

1. INTRODUCTION PALICLAS is a collaborative research project on the palaeoenvironmental analysis of Italian crater lake and Adriatic sediments. Its main aims are to provide improved fine-resolution proxy palaeoclimate records for central Italy for the last 25k+ years, and to characterise the responses of both terrestrial and aquatic ecosystems during periods of rapid climatic change. Central to these aims has been the need to make direct comparisons between different biological, chemical and physical proxy records that have been measured in the sediments of Lake Albano and Lake Nemi, two crater lakes situated in the Alban Hills to the south of Rome, and a wide range of cores from the Adriatic (Oldfield 1996a, this volume). To ensure compatibility and consistency between these different data types, and to provide a rapid and efficient means of comparing and cross-correlating the different proxy records within and between sites, all data from the project has been stored in a single relational database. This paper describes the structure of that database and associated software, and the types of data it currently holds.

322

S. Juggins

2. DATABASE STRUCTURE The philosophy that guided the development of the PALICLAS database was that as well as providing a mechanism for data storage and a long term data archive, it should also contain a range of software tools to facilitate the rapid retrieval, display and numerical analysis of the multi-proxy records it contains. With these aims in mind it is convenient to consider the database as consisting of three parts: (1) tables containing raw data, (2) views, or saved queries, that present the raw data in a form suitable for subsequent numerical or graphical analysis, and (3) applications software. The raw data tables and views are currently stored in a relational database using Microsoft Access Version 2.0 database software. Application software has been written using Microsoft Visual C++ Version 4.0. The combination of Microsoft Access and C++ is particularly suited to the PALICLAS database because (1) both products are available for the IBM PC and Macintosh environments, allowing us to undertake development work on the PC, and to recompile code on the Macintosh with little additional effort, (2) the database front-end can be quickly prototyped and implemented using Visual Basic for Applications, and this can be seamlessly integrated with numerical software tools that will be developed in the more appropriate C++ language, and (3) it allows us to make extensive use of OLE Custom Controls, which will greatly reduce programming effort in areas of graphical analysis, database output, and user-interface design. The three main database components - tables, queries, and software - are described in more detail below. 2.1. Data tables PALICLAS data is currently stored within the database in 38 separate tables (Figure 1). The database structure is loosely based on the models developed for the European and North American Pollen Databases (EPD and NAPD) (Grimm 1995) and the diatom database developed at University College London, (Munro et al. 1990), and extended by the inclusion of additional tables to incorporate new types of data. This will facilitate future data transfer between PALICLAS and EPD, or with other developing palaeolimnological databases. Tables within the PALICLAS database can be divided into five main types according to the type of data they store (Tab. 1). Figure 1 also classifies the tables as either Archive Tables that contain raw data, or Research Tables that contain derived, or inferred information. For example, the original 14C determinations are stored in an archive table, but the inferred age-depth model that uses these dates is stored in a research table. The Table Groups are briefly described below.

The PALICLAS database

323

Fig. 1. PALICLAS database tables. Dashed boxes group archive tables, dotted boxes group research tables. Tab. 1. Summary of PALICLAS data tables Table Group

Table Names

Description

Cores Physical & Chemical

CoreDescription Lithology, Magnetics, Geochemistry, Pigments, Isotopes Pollen, Diatoms, Chrysophytes, Chironomids, Foraminifera, Ostracods Palaeomagnetics, SpotDate

Core location, water depth, coring notes. Sample information (core, depth) for each physical and chemical parameter Sample, taxonomic, and ecological information, and counts for each microfossil group Tables that store raw dating information (14C, tephra, palaeomagnetics) Tables defining a mapping function that numerically describes the correlation between cores Tables that define age-depth models for each core

Biological Chronology Correlation

MappingBasis, CoreMapping

Age-Depth

ModelBasis, AgeDepthModel

• Cores: Contains a single table CoreDescription that records the location, water depth, coring device and coring notes of each PALICLAS core. • Physical & Chemical: Contains six tables that record the physical and chemical parameters measured on each core sample. Each row in a table contains information on several parameters relating to a single level and is uniquely referenced by core name and sample depth.

324

S. Juggins

• Biological: Contains 22 tables that record information on the various microfossil groups studied in the project. Information for each group is contained in 3 or 4 tables. The Samples table records the core name and depth of each sample, together with other sample-specific information such as sample weight. The Counts table records individual microfossil counts, one row for each taxon / sample couplet. The Vars table contains the code and full name of each taxon recorded, together with summary ecological classifications (e.g., planktonic / benthic for diatoms & foraminifera; trees, shrubs etc. for pollen). Data management for most of the biological data was relatively straightforward as all samples were analysed and recorded using a standardised methodology and consistent taxonomic nomenclature. However, some differences in nomenclature and level of taxonomic resolution occurred between pollen data generated by different laboratories. An additional Harmonise table was therefore defined to reconcile these differences. The use of this table is described below. • Chronology: Contains two tables recording chronological information. The first contains information on palaeomagnetic parameters. The second SpotDate contains information on spot dates (14C determinations, tephra etc.). • Correlation: Contains two tables that define correlations between pairs of cores. The first, MappingBasis, describes the basis of the mapping function in terms of the types of data used, for example, magnetic susceptibility, diatom biostratigraphy etc. The second, CoreMapping, contains a numerical description of the mapping function. This is used by application software to express the levels in one core on the depth scale of a second "master" core. • Age-Depth: Contains two tables that define age-depth models for PALICLAS cores. The first, ModelBasis contains a description of the age-depth model and records information on each dated level, or control point, expressed in calibrated 14 C years BP, together with the numerical basis for the model. The second, AgeDepthModel, contains the coefficients of the fitted model. This table is used by application software to express the levels in a core on a calibrated age scale. All tables or views that contain information on individual core samples are uniquely referenced by fields recording core name, and the top and bottom depth of the core slice. Retrieval queries may be defined to link, or join, different tables using either the core name, or combination of core name and depth, to merge different data types into a single table for output. However, none of these links are "hard-coded" into the database. Consequently it is very easy to add additional tables containing new data types to the database without the need to restructure existing information. Most PALICLAS partners use Microsoft Excel spreadsheet (or an Excel-compatible program) for their own local data entry and analysis. A series of protocols were therefore drawn up describing the formatting of each data type using this software. Laboratories then submit data as Excel spreadsheet files. The only exception to this is pollen data, which are submitted in TILIA format (Grimm 1991-3). The system of data transfer adopted within PALICLAS allows each laboratory to use the

The PALICLAS database

325

hardware / software configuration they wish, and to submit data in a format they are familiar with, but at the same time facilitates quality control and helps maintain the data integrity and compatibility offered by a central relational database. For example, inconsistencies in recording sediment levels or in coding microfossil taxa have been quickly identified and resolved. 2.2. Saved queries In addition to the data tables described above, the PALICLAS database also contains a number of saved queries that define different "views" of the raw data. The use of saved queries allows great flexibility in the way the data are presented to the user, but avoids the need to store multiple copies of the raw data in different formats. This helps to ensure internal data integrity as the raw data is stored only once - if the raw data changes then data examined via queries is automatically updated. For example, the database contains raw diatom counts for each sample but diatomists usually present this information in percentage form. A DiatomPercentage query therefore allows the data to be viewed, queried or output in percentage format by first calculating the total count for each sample and then multiplying each value in the Counts table by 100/total. Similarly, for Lake Albano the diatom-based core correlations are based on planktonic taxa only. A DiatomPlanktonPercentage query was therefore defined that links the Counts, Samples and Vars tables and allows the output or further querying of the planktonic diatom record, expressed as a percentage of total planktonic taxa. Pollen counts are managed in the same way and retrieved as either raw counts, or percentages based on different calculated sums. In addition, by joining the Vars table to the Harmonise table a query may be defined that reconciles the differences in taxonomic resolution between different palynologists. This approach allows the original data to be stored with the maximum resolution but allows different cores to be retrieved using a harmonised taxonomy for subsequent comparative analysis. 2.3. Data Retrieval and Application software Data may be retrieved from the database and manipulated in a number of ways (Fig. 2). The simplest retrieval consists of a single-table or single-view query that retrieves a subset of records and/or a subset of fields from a table or view. An example of this would be the retrieval of the whole-core magnetic susceptibility records for Lake Albano core 1E from the WholeCoreMagnetics table. The result of the query could either be saved as new table, or exported for subsequent analysis. More complicated is a multi-table join to combine data from two or more different tables or views. An example of this would be to combine pigment and geochemical data and export these data in single Excel spreadsheet.

326

S. Juggins

Fig. 2. PALICLAS data retrieval and manipulation, showing relationships between original raw data or saved queries, retrieval queries, optional projection to common depth or age scale, optional data smoothing and optional interpolation to regular depth or age intervals, and data output.

All data stored in the database is referenced by its core name and sample depth. To compare data from different cores it is obviously necessary to express the samples on a common scale. The PALICLAS database allows this to be done in a number of ways using a custom-written CoreProjection program to directly retrieve information from the database and "project" core samples onto an alternative depth or age scale. Using information stored in the CoreMapping table, the records from any core can be projected onto the depth-scale of a "master" core from the same basin. For example, in the development of age-depth models for Lake Albano, a core mapping function was defined that allows data from Lake Albano core 3A to be projected onto a core 1E equivalent depth scale. The same core-projection software can also use information in the AgeDepthModel table to project core samples onto a calibrated 14C age scale. Once retrieved on a common depth or age scale, data may be output from the database in three ways. The first is direct output from the database in Excel format.

The PALICLAS database

327

This provides access to Excel's modelling tools, as well as a convenient format for importing the data into many statistical software packages. The second is via a custom written program to convert the data into a format suitable for subsequent analysis by a variety of specialist numerical / statistical software packages. So far routines have been written to convert the data to TILIA (Grimm 1991-3), Cornell (CANOCO) (ter Braak 1988), PCSLOT (Thompson & Clark 1989) and COMPCURF (Pels et al. 1996) formats. One major problem in managing and analysing multi-core data is that samples are inevitably recorded at different "equivalent" depths or ages in different cores. Even when the data are derived from the same core, the nature of the different analyses, and their demands for varying amounts of sediment, mean that few analyses are carried out on the same sediment slice throughout the length of the core. Given this problem the database contains custom software to efficiently plot multiple records that have been analysed at different core intervals. This software allows the co-plotting of multiple data types recorded at different depth or age intervals for data presentation, or for visual comparison of between-record correlations. The next step in the analysis of such data is to attempt to quantify the strength and test the significance of the observed patterns of co-variation. However, numerical analysis of these data is problematic because most numerical techniques require that the records to be compared have been measured at the same points in time (or depth). A further more stringent requirement of some analyses is that the data points should also be regularly spaced in time. To prepare data for such analyses the PALICLAS database also contains routines to smooth and / or interpolate each multi-proxy time series onto (1) the depth or age intervals of another proxy record, or (2) common, regularly spaced depth or age intervals. 3. CONCLUDING STATEMENT The PALICLAS database currently contains over 150,000 data points, representing measurements of over 15 different proxy indicators from 4000+ levels in a total of 14 long and 10 short cores. In addition to providing a long-term data archive for the project, the database also provides a unique research tool for the rapid retrieval, comparison and cross-correlation of different proxy records within the same core, between different cores from the same site, and between the marine and lake records. So far the database has facilitated core-correlation, the construction of agedepth models, and the graphical display of these multi-proxy datasets on a common depth or calibrated age scales (e.g., Oldfield 1996b, this volume). Future use of the database will concentrate on the numerical analysis of these data, to test the significance and quantify the strength of the various inter-proxy relationships, with an aim to develop models of aquatic and terrestrial ecosystem response to past climatic fluctuations. REFERENCES Grimm, E.C. 1991-3. Tilia Version 2.0. Illinois State Museum, Research and Collections Centre.

328

S. Juggins

Grimm, E.C. 1995. North American pollen database. In: D.M. Anderson (Ed.), Global Paleoenvironmental Data. A report from the workshop sponsored by Past Global Changes (PAGES) August, 1993, Report Series 95-2, Berne: 53-56 pp. Munro, M.A.R., A.M. Kreiser, R.W. Battarbee, S. Juggins, A.C. Stevenson, D.S. Anderson, N.J. Anderson, F. Berge, H.J.B. Birks, R.B. Davis, R.J. Flower, S.C. Fritz, E.Y. Haworth, V.J. Jones, J.C. Kingston & I. Renberg. 1990. Diatom quality control and data handling. Philosophical Transactions of the Royal Society of London, B, 327: 257-261. Oldfield, F. 1996a. PALICLAS: the partnership, its perspectives and goals. In: Guilizzoni, P. & F. Oldfield (Eds), Palaeoenvironmental Analysis of Italian Crater Lake and Adriatic Sediments (PALICLAS). Mem. Ist. ital. Idrobiol., 55: 7-16. Oldfield, F. 1996b. The PALICLAS Project: synthesis and overview. In: Guilizzoni, P. & F. Oldfield (Eds), Palaeoenvironmental Analysis of Italian Crater Lake and Adriatic Sediments (PALICLAS). Mem. Ist. ital. Idrobiol., 55: 329-357. Pels, B., J.J. Keizer & R. Young. 1996. Automated biostratigraphic correlation of palynological records on the basis of shapes of pollen curves and evaluation of next-best solutions. Palaeogeography, Palaeoclimatology, Palaeoecology, 124: 17-37. ter Braak, C.J.F. 1988. CANOCO - a FORTRAN program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis. Technical Report LWA-88-02. Agricultural Mathematics Group, Wageningen. Thompson, R. & R.M. Clark. 1989. Sequence slotting for stratigraphic correlation between cores: theory and practice. J. Paleolimnol., 2: 173-184.