Long-term datasets: From descriptive to predictive data ... - DPI/Inpe

18 downloads 4489 Views 137KB Size Report
for the processing and analysis of these datasets. Ecoin- formatics deals with the ... Recently, the linkage between long-term data sets describing vegetation ...
458

Bekker, R.M. et al.

Long-term datasets: From descriptive to predictive data using ecoinformatics Bekker, Renée M.1*; van der Maarel, Eddy1; Bruelheide, Helge2 & Woods, Kerry3 Community and Conservation Ecology Group, University of Groningen, P.O. Box 14, 9750 AA Haren, The Netherlands; 2 Institute of Biology / Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Am Kirchtor 1, 06108 Halle, Germany; 3Natural Sciences, Bennington College, Bennington VT 05201, USA; *Corresponding author; Fax +31 50363 2273; E-mail [email protected]

1

Abstract This Special Feature includes contributions on dataprocessing of large ecological datasets under the heading ecoinformatics. Herewith the latter term is now also established in the Journal of Vegetation Science. Ecoinfomatics is introduced as a rapid growing field within community ecology which is generating exciting new developments in ecology and in particular vegetation ecology. In our field, ecoinformatics deals with the understanding of patterns of species distributions at local and regional scales, and on the assemblages of species in relation to their properties, the local environment and their distribution in the region. Community ecology using ecoinformatics is related to bioinformatics, community ecology, biogeography and macroecology. We make clear how ecoinformatics in vegetation science and particularly the IAVS Working Group on Ecoinformatics has developed from the work of the old Working Group for Data Processing which was active during the 1970s and 1980s. Recent developments, including the creation of TURBOVEG and SynBioSys in Europa and VEGBANK in the USA, form a direct link with these pioneer activities, both scientifically and personally. The contributions collected in this Special Feature present examples of seco-infeveral types of the use of databases and the application of programmes and models. The main types are the study of long-term vegetation dynamics in different cases of primary and secondary succession and the understanding of successional developments in terms of species traits. Among the future developments of great significance we mention the use of a variety of different large datasets for the study of the distribution and ecology and conservation of rare and threatened species. Keywords: Biogeography; Bioinformatics; Community ecology; Database; Macroecology; Ordination; Predictive modelling; Trait.

This Special Feature of the Journal of Vegetation Science includes papers originating from a session at the 48th Conference of the International Association of Vegetation Science in Lisbon, July 2005. This session was inspired by the 46th IAVS meeting in Naples, June 2003, where the term ʻecoinformaticsʼ was formally introduced in IAVS circles through the lecture on Ecoinformatics and the future of community ecology by Peet (2003). This was the start of the Working Group on ecoinformatics (see IAVS Bulletin 9, p. 17). Since then several meetings and sessions involving ecoinformatics were held in Hawaii, Halle, Germany, and Palmerston North, New Zealand. The Working Group has a website (http://www.bio.unc. edu/faculty/peet/vegdata/resources. htm) with references to scores of databases and information sources. Nevertheless, up till now no paper with ecoinformatics in the title or keywords has appeared in the Journal of Vegetation Science or Applied Vegetation Science. Thus, publication of the current series of contributions presented here is a milestone in the history of IAVS. This rapidly growing subfield of community ecology is generating exciting new developments. New ways of analysing large datasets by data mining and exploiting links with other than floristic databases, such as databases on plant life-history traits, phylogeny, climate and soil, provide new insights in vegetation ecology (e.g. Gégout et al. 2005). The emergence of many large digital vegetation datasets containing permanent plot data and other floristic data as well as a huge amount of metadata has emphasized the need for cooperation in developing standards for data structures, exchange formats, protocols and software worldwide to enable the exchange of this information. Moreover, computational and statistical developments have allowed rapid evolution of routines for the processing and analysis of these datasets. Ecoinformatics deals with the understanding of patterns of species distributions at local and regional scales, and on the assemblages of species in relation to their properties, the local environment and their distribution in the

- Long-term datasets: From descriptive to predictive data using ecoinformatics region. At present, the spatial scale of study tends to be somewhat larger than is common in community ecology but smaller than that in macroecology, but with expanding databases, ecoinformatics will increasingly contribute to current topics in macroecology. The field of ecoinformatics seems to develop amidst adjacent fields such as bioinformatics, community ecology, biogeography and macroecology. Where the main focus of bioinformatics is on solving biological issues (mainly) on the molecular level by the use of applied mathematics, statistics and computational science and the collaboration with (bio)chemistry (Anon. 2004; Baxevanis & Ouellette 2005), community ecology nowadays addresses questions related to (1) species distribution, abundance, and demography; (2) interactions between coexisting populations on focal plots; (3) local community assemblages, biodiversity and species dynamics (Odum 1971; Diamond 1975; Weiher & Keddy 1999; Loreau et al. 2001; Tilman 2004). Macroecology (first defined by Brown & Maurer 1989) mainly focuses on the ecological patterns and interactions between organisms, along with their characteristics and the environment, on large spatial scales, e.g. the distributions of body size over whole continents. Emphasis on the ecosystem approach and making use of ecosystem properties calls for the use of large databases of species occurrences and dynamics, biodiversity information, and species properties. These can be analysed with multivariate techniques and complex spatial statistics to infer the distribution of properties and patterns over large areas (Gaston & Blackburn 2000; Burns 2007). Recently, the linkage between long-term data sets describing vegetation dynamics and complex climate data, other environmental descriptors such as soil and moisture parameters, microorganism populations in the soil, and plant functional traits has become possible. With these new data mining techniques new insights in vegetation ecology have emerged and are expanding their influence in two directions. 1. The flow of data into theoretical ecological models has increased significantly, so that, for example, discussions about the neutral theory of biodiversity (Hubbell 2001) vs. niche theory have become part of modern empirical vegetation ecology. 2. Long-term datasets are being used by expert systems that advise nature management authorities on future management practices based on the historical development documented in vegetation records as well as on the ecological traits of plant species (for example, in SynBioSys-Europe (Schaminée et al. 2007, this issue). The topic seems even that hot that in 2006 a new journal, ʻEcological Informaticsʼ was launched which until now has mainly focused on the link between new analytic tools to analyse patterns in data from genomes to ecosystems.

459

Amongst the contributions included in this Special Feature the paper on SynBioSys by Schaminée et al. (2007, this issue) is a logical starter because it establishes a link with the computer programs developed during the 1970s and 1980s in the IAVS Working Group for DataProcessing which had as its major aim the erection of large phytosociological datasets and the development of classification and ordination programs to treat the assembled data (van der Maarel et al. 1980; Grabherr et al. 1989). The largest dataset which was built up in that period was that of ca. 7000 European salt marsh relevés (Kortekaas et al. 1979). Several computer programs developed by members of this Working Group and colleagues cooperating with the Group have been widely used, including TABORD (van der Maarel et al. 1978) and its successor FLEXCLUS (van Tongeren 1986). Two major programs dating from that time, TWINSPAN (Hill 1979a) and DECORANA (Hill 1979b) are still in use, the latter in the form of CANOCO (ter Braak & Šmilauer 1998). Part of the history of this development can be found in the textbook by Jongman et al. (1995), which appeared in the same year as Hennekens (1995) published the first version of TURBOVEG, the development of which forms a historical continuum with the programs mentioned above. Many of the multivariate techniques have been made available in the PCORD-suit (McCune & Grace 2002). Alongside with the development of tools a similar progress with the assemblage of databases was achieved. The initiative of Robert Peet and others in the USA resulted in the standardized vegetation database VEGBANK (URL: http://www.vegbank.org). Another example out of the wealth of databases we could cite comprises the massive database that has been assembled since 1905 by the USDA Forest Service containing permanent plot inventories and large quantities of metadata. The databases and programs for treating the data were mainly used for classification and ordination of plant communities. Later on, the structure of the databases needed to be adapted to their use in studies of vegetation dynamics. An early example from within the Working Group for Data-Processing – which had developed into the Working Group for Theoretical Vegetation Science (e.g. Grabherr et al. 1989) – was the dataset of permanent plots of dune slacks in the dunes of Voorne, The Netherlands, comprising over 1400 analyses (e.g. Noest et al. 1989). An example of a database adapted for the study of vegetation dynamics is the collection of long-term observations of permanent vegetation plots, originally erected by Braun-Blanquet in the Swiss National Park to study the reforestation of pastures abandoned in 1914, the year of the foundation of the Park (Schütz et

460

Bekker, R.M. et al.

al. 2000). Wildi & Schütz (2007, this issue) synthesize the data from 982 relevés from 74 plots with different starting points and different observation frequencies by creating ʻrestored time seriesʼ, and thus hypothesizing the pathways of much longer vegetation successions than can be recorded in an ecologistʼs life time. Another example of a succession study is presented by del Moral (2007, this issue) who describes a case of primary succession on Mount St. Helens, Washington, USA. Here, a dataset is used comprising yearly analyses of 20 permanent plots along a 1-km transect with an altitudinal range of 250 m from 1980 (when a volcanic eruption occurred) to 2005. The questions to solve here are: ʻDo changes along the transect recapitulate succession?ʼ, and ʻDo plots converge to similar composition over time?ʼ Floristic changes were characterized by techniques including DCA and clustering. The study on species dynamics by Ozinga et al. (2007, this issue) makes use of the large databank of Dutch relevés but concentrates on plant trait data derived from the LEDA database of life-history traits of the Northwest European flora (Kleyer 1995; Thompson et al. 1997; Poschlod et al. 2003; Knevel et al. 2003, 2005). The persistence of vascular plants in 845 permanent plots across The Netherlands was monitored once a year for 5 to 40 years. The questions asked are: which plant traits and habitat characteristics best explain local above-ground persistence of vascular plant species, and is there a trade-off between local above-ground persistence and the ability for seed dispersal and below-ground persistence in the soil seed bank?ʼ The study by Endels et al. (2007, this issue) is similar to the previous study in that other databases were used for the characterization of 137 forest plants in 239 ʻforest patchesʼ in southern Belgium, each with a homogeneous land-use history and a unique forest age. Data on plant traits of all species found in these forest patches were gathered and integrated from five plant databases, altogether 24 plant traits were included in their analysis (Hodgson et al. 1995; Klotz et al. 2002). On the basis of these traits three groups of species emerged: (1) mainly shrubs or climbers with fleshy or wind dispersed fruits and high dispersal potential; (2) largely small, mainly vegetatively reproducing herbs; (3) mainly spring flowering herbs with large seeds and unassisted dispersal. A major database for data from long-term field experiments related to nature management problems is that developed by R.H. Marrs, M.G. Le Duc, R.J. Pakeman and others at the Applied Vegetation Dynamics Laboratory (AVDL, Liverpool). Le Duc et al. (2007, this issue) describe some data-management problems of importance for long-term studies, including data manipulation, archiving, quality assessment, and flexible retrieval for analysis. An extension as compared with TURBOVEG-

managed databases is the inclusion of facilities for handling metadata of a type generated by complex experiments and structured surveys. To improve the use of the phytosociological data, a ʻpurpose-builtʼ database is developed that can store AVDL data sets with a system of standard queries for future investigations and meta-analyses. Databases of phytosociological data can also be used to address autecological questions through extraction of data on occurrences of a particular species in relation to abiotic data stored in the database. Coudun et al. (2007, this issue) demonstrate this by relating the distribution and abundance of Vaccinium myrtillus in France with climatic and edaphic factors and developing a predictive model. They used the phyto-ecological database EcoPlant from which they extracted 2905 forest sites. The model constructed was then applied to 9830 independent relevés extracted from Sophy, another large phytosociological database from France. A different type of database, probably first developed in the forestry community, includes only characteristics (traits) for individual species. A good example is the database on tree measurements in the forest reserve network of the Swiss Federal Institute of Technology Zurich (ETH Zurich) currently maintained jointly by the Chair of Forest Ecology (ETH Zurich) and the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL Birmensdorf). The network was initiated in the late 1940s to observe the long-term dynamics of unmanaged, nearnatural forests in Switzerland (Leibundgut 1957). Wunder et al. (2007, this issue) discuss the use of this database, applying it in prediction of mortality probabilities for deciduous and coniferous trees from permanent plot data that describe growth patterns, tree species, tree size and site conditions. The RAINFOR database for monitoring forest biomass by Peacock et al. (2007, this issue) combines time series and species characteristics. In this case over 300 000 total records of tree diameter measurements from over 100 permanent sample plots across Amazonia were incorporated. The database is designed to enable linkages to existing soil, floristic or plant trait databases. The study by Woods (2007, this issue) is another example of a community dynamics study largely based on data concerning woody stem diameters in some hundreds of permanent plots, most of them dating from 1935. A main question is whether the changes in these old-growth forests in Michigan are predictable and whether the answers are dependent on temporal and spatial scales. How can complex, long-term observational data be used most powerfully to address these questions?

- Long-term datasets: From descriptive to predictive data using ecoinformatics A final group of two papers deal with smaller case studies which use their own databases but nevertheless contribute to the development of ecoinformatics. Szwagrzyk & Gazda (2007, this issue) deal with the well-known relationship between species diversity and various aspects of ecosystem functioning in natural temperate deciduous forests of Poland and adjacent countries. In this study two hypotheses were tested: Above-ground tree biomass is positively correlated with the number of tree species and the Simpson diversity index calculated for tree species, and Above-ground tree biomass is positively correlated with the number of functional groups and the Simpson diversity index calculated for functional groups of tree species. Data were collected in 34 forest stands. Finally, Wang (2007, this issue) studied leaf trait co-variation in a chronosequence comprising four stages of secondary succession on the Loess Plateau of northwestern China, varying from a few years to 150 years. Specific leaf area, leaf mass per area, leaf nitrogen, leaf phosphorus and leaf dry matter content were related to above-ground net primary productivity, specific rate of litter mass loss, total soil carbon and soil nitrogen. Wide perspectives arise for and from the field of ecoinformatics in many countries around the world. For instance, in the USA, Canada and New Zealand several parallel initiatives and programs by cooperating government agencies have been launched that will increasingly depend on, and generate, questions to be solved by means of ecoinformatics. The European Union has issued directives prescribing the monitoring of particular protected plant or animal populations and rare or vulnerable ecosystems and prohibiting economic developments which may harm nature conservation interests. Policy makers have acknowledged the importance of the development of predictive community models, ecoinformatics and the exploration of multilayered plant and animals co-occurrence data on different spatial scales. Much work is still to be done in the field of spatial statistics, combining georeferenced biotic and abiotic information to predict spatial community patterns. Here, first insight into the processes at work in the assembly of communities has already been gained by combining floristic, phytosociological and functional information in GIS systems (e.g. Ozinga et al. 2005). Another new topic is the inference of indicator species from vegetation databases (e.g. Tichý & Chytrý 2006), in combination with functional traits and ecosystem properties, for protected rare and mobile species by the fine-tuning of data mining techniques and the standardisation and linkage of all sorts of ecological databases. The most promising way of linking these databases seems to be through virtual databases that communicate via a network querying approach. Many initiatives for the development of

461

linked local databases have already been set up all over the world and the first results can be expected soon. References Anon. (International Human Genome Sequencing Consortium). 2004. ʻFinishing the euchromatic sequence of the human genome.ʼ. Nature 431: 931-945. Baxevanis, A.D. & Ouellette, B.F.F. (eds.) 2005. Bioinformatics: A practical guide to the analysis of genes and proteins. 3rd. ed. Wiley, New York, NY,US. Brown, J.H & Maurer, B.A. 1989. Macroecology: The division of food and space among species on continents. Science 243: 1145-1150. Burns, K.C. 2007. Is tree diversity different in the Southern Hemisphere? J. Veg. Sci. 18: 307-312. Coudun, C. & Gégout, J.-C. 2007. Quantitative prediction of the distribution and abundance of Vaccinium myrtillus with climatic and edaphic factors. J. Veg. Sci. 18: 517-524. del Moral, R. 2007. Limits to convergence of vegetation during early primary succession. J. Veg. Sci. 18: 479-488. Diamond, J.M. 1975. Assembly of species communities. In: Cody, M.L. & Diamond, J.M. (eds.) Ecology and evolution of communities, pp. 324-444. Harvard University Press, Cambridge, MA, US. Endels, P., Adriaens, D., Bekker, R.M., Knevel, I.C., Decocq, G. & Hermy, M. 2007. Groupings of life-history traits are associated with distribution of forest plant species in a fragmented landscape. J. Veg. Sci. 18: 499-508. Gaston, K.J. & Blackburn, T.M. 2000. Pattern and process in Macroecology. Blackwell Science, Oxford, UK. Gégout, J.-C., Coudun, C., Bailly, G. & Jabiol, B. 2005. EcoPlant: A forest site database linking floristic data with soil and climate variables. J. Veg. Sci. 16: 257-260. Grabherr, G., Mucina, L., Dale, M.B. & ter Braak, C.J.F. 1989. Progress in theoretical vegetation science. Kluwer, Dordrecht, NL. Hennekens, S.M. 1995. TURBO(VEG). Software package for input, processing, and presentation of phytosociological data. Userʼs guide. Instituut voor Bos en Natuur, Wageningen, NL and Unit of Vegetation Science, University of Lancaster, Lancaster, UK. Hennekens, S.M. & Schaminée, J.H.J. 2001. TURBOVEG, a comprehensive data base management system for vegetation data. J. Veg. Sci. 12: 589-591. Hill, M.O. 1979a. TWINSPAN, A FORTRAN program for arranging multivariate data in an ordered two-way table by classification of individuals and attributes. Cornell University, Ithaca, NY, US. Hill, M.O. 1979b. DECORANA, A FORTRAN program for detrended correspondence analysis and reciprocal averaging. Cornell University, Itaca, NY, US. Hodgson, J.G., Grime, J.P., Hunt, R. & Thompson, K. 1995. The electronic comparative plant ecology. Chapman & Hall, London, UK. Hubbell, S.P. 2001. The unified neutral theory of biodiversity and biogeography. Monographs in population biology 32. Princeton University Press, Princeton, NJ, US.

462

Bekker, R.M. et al.

Jongman, R.H.G., ter Braak, C.J.F. & van Tongeren, O. (eds.) 1995. Data analysis in community and landscape ecology. Cambridge University Press, Cambridge, UK. Kleyer, M. 1995. Biological traits of vascular plants – a database. Arbeitsberichte Institut für Landschaftsplanung und Ökologie, Universität Stuttgart, Stuttgart, DE. See also www.leda-traitbase.org Klotz, S., Kühn, I. & Durka, W. 2002. BIOLFLOR – Eine Datenbank mit biologisch-ökologischen Merkmalen zur Flora von Deutschland. www.biolflor.de, Bundesamt für Naturschutz, Bad Godesberg. DE. Knevel, I.C., Bekker, R.M., Bakker, J.P. & Kleyer, M. 2003. Life-history traits of the Northwest European flora: The LEDA database. J. Veg. Sci. 14: 611-614. Knevel, I.C., Bekker, R.M., Kunzmann, D., Stadler, M. & Thompson, K. (eds.) 2005. The LEDA Traitbase - collecting and measuring standards of life-history traits of the Northwest European flora. University of Groningen, Groningen, NL. Kortekaas, W.M., Lausi, D., Beeftink, W.G. & van der Maarel, E. 1980. Survey of salt marsh relevés included in the databank of the Working-Group for data-processing. In: van der Maarel, E., Orlóci, L. & Pignatti, S. (eds.) Data-processing in phytosociology, pp. 207-225. Junk, The Hague, NL. Le Duc, M.G., Yang, L. & Marrs, R.H. 2007. A database application for long-term ecological field experiments. J. Veg. Sci. 18: 509-516. Leibundgut, H. 1957. Waldreservate in der Schweiz. Schweiz. Z. Forstwes. 108: 417-421. Loreau, M., Naeem, S., Inchausti, P., Bengtsson, J., Grime, J. P., Hector, A., Hooper, D.U., Huston, M.A., Raffaelli, D., Schmid, B., Tilman, D. & Wardle, D.A. 2001. Biodiversity and ecosystem functioning: current knowledge and future challenges. Science 294: 804-808. McCune, B. & Grace, J.B. (with Urban, D.L.) 2002. Analysis of ecological communities. Mjm Software Design, Gleneden Beach, OR, US. Noest, V., van der Maarel, E., van der Meulen, F. & van der Laan, D. 1989. Optimum-transformation of plant species cover-abundance values. Vegetatio 83: 167-178. Odum, E.P. 1971. Fundamentals in ecology. 3rd. ed. W.B. Saunders Company, Philadelphia, MA, US. Ozinga, W.A., Hennekens, S.M., Schaminée, J.H.J., Bekker, R.M., Prinzing, A., Bonn, S., Poschlod, P., Tackenberg, O., Thompson, K., Bakker, J.P. & van Groenendael, J.M. 2005. Assessing the relative importance of dispersal in plant communities using an ecoinformatics approach. Folia Geobot. 40: 53-67. Ozinga, W.A., Hennekens, S.M., Schaminée, J.H.J., Smits, N.A.C., Bekker, R.M., Römermann, C., Klimeš, L., Bakker, J.P. & van Groenendael, J.M. 2007. Local above-ground persistence of vascular plants: Life-history trade-offs and environmental constraints. J. Veg. Sci. 18: 489-498. Peacock, J., Baker, T.R., Lewis, S.L., Lopez-Gonzalez, G. & Phillips, O.L. 2007. The RAINFOR database, for monitoring forest biomass and dynamics. J. Veg. Sci. 18: 535-542. Peet, R.K. 2003. Information models for managing plot records and multiple taxonomies in vegetation databanks. In: Ab-

stracts 46th Symposium of the International Association of Vegetation Science on Water Resources and Vegetation, p. 175. Università di Camerino, Camerino, IT. Poschlod, P., Kleyer, M., Jackel, A.-K., Dannemann, A. & Tackenberg, O. 2003. BIOPOP – a database of plant traits and internet application for nature conservation. Folia Geobot. 38: 263-271 (http://www.uni-oldenburg.de/landeco/Projects/biopop/biopop_en.htm) Schaminée, J.H.J., Hennekens, S.M. & Ozinga, W.A. 2007. Use of the ecological information system SynBioSys for the analysis of large datasets. J. Veg. Sci. 18: 463-470. Schütz, M., Krüsi, B.O. & Edwards, P.J. 2000. Succession research in the Swiss National Park. From Braun-Blanquetʼs permanent plots to models of long-term ecological change. Nationalpark-Forschung in der Schweiz 89: 1-255. Szwagrzyk, J. & Gazda, A. 2007. Above-ground standing biomass and tree species diversity in natural stands of Central Europe. J. Veg. Sci. 18: 555-562. ter Braak, C.J.F. & Šmilauer, P. 1998. CANOCO 4 Reference manual and userʼs guide to Canoco for Windows: Software for Canonical Community Ordination (version 4). Microcomputer Power, Ithaca, NY, US. Thompson, K., Bakker, J.P. & Bekker, R.M. 1997. The soil seed banks of North West Europe: methodology, density and longevity. Cambridge University Press, Cambridge, UK. Tilman, D. 2004. Niche tradeoffs, neutrality, and community structure: A stochastic theory of resource competition, invasion, and community assembly. Proc. Natl. Acad. Sci. 101:10854-10861 Tichý, L. & Chytrý, M. 2006. Statistical determination of diagnostic species for site groups of unequal size. J. Veg. Sci. 17: 809-818. van der Maarel, E., Janssen, J.G.M. & Louppen, J.M.W. 1978. TABORD, a program for structuring phytosociological tables. Vegetatio 38: 143-156 van der Maarel, E., Orlóci, L. & Pignatti, S. (eds.) 1980. Dataprocessing in phytosociology. Advances in Vegetation Science, Vol. 1. Junk, The Hague, NL. van Tongeren, O. 1986. FLEXCLUS, an interactive program for classification and tabulation of ecological data. Acta Bot. Neerl. 35: 137-142. Wang, G. 2007. Leaf trait co-variation, response and effect in a chronosequence. J. Veg. Sci. 18: 563-570. Weiher, E. & Keddy, P. (eds.) 1999. Ecological assembly rules: perspectives, advances, retreats. Cambridge University Press, Cambridge, UK. Wildi, O. & Schütz, M. 2007. Scale sensitivity of synthetic long-term vegetation time-series derived through overlay of short-term field records. J. Veg. Sci. 18: 471-478. Woods, K.D. 2007. Predictability, contingency, and convergence in late succession: Slow systems and complex data-sets. J. Veg. Sci. 18: 543-554. Wunder, J., Reineking, B., Matter, J.-F., Bigler, C. & Bugmann, H. 2007. Predicting tree death for Fagus sylvatica and Abies alba using permanent plot data. J. Veg. Sci. 18: 525-534.

Suggest Documents