How Should Ecologists Define Sampling Effort? - Wiley Online Library

49 downloads 116683 Views 331KB Size Report
Key words: Procrustes analysis; sample size; sampling design. HOW MANY ... We used this pilot project to define the optimal sampling effort for subsequent ...
BIOTROPICA 47(4): 399–402 2015

10.1111/btp.12222

INSIGHTS

How Should Ecologists Define Sampling Effort? The Potential of Procrustes Analysis for Studying Variation in Community Composition Victor S. Saito1,4, Alaide A. Fonseca-Gessner2, and Tadeu Siqueira3 1

s-Graduacßa ~o em Ecologia e Recursos Naturais, UFSCar - Universidade Federal de Sa ~o Carlos, Sa ~o Carlos, SP, Brazil Programa de Po

2

~o Carlos, Sa ~ o Carlos, SP, Brazil Departamento de Hidrobiologia, UFSCar - Universidade Federal de Sa

3

Departamento de Ecologia, UNESP – Universidade Estadual Paulista, Rio Claro, SP, Brazil

ABSTRACT Appropriate sampling effort is crucial for ecologists. Procrustes analysis can be used to tackle this question by quantifying the match between subsamples and the complete dataset. We used stream macroinvertebrates to show how sampling design can be optimized by reducing the number of subsamples and increasing the number of sites. Abstract in Portuguese is available with online material. Key words: Procrustes analysis; sample size; sampling design.

HOW MANY SAMPLES TO COLLECT? This common issue faces community ecologists and conservation biologists when planning sampling design. Collecting more samples comes closer to capturing the actual community composition (Bonar et al. 2011), but can substantially increase the cost of a specific study. Although in experiments we can control for undesirable variation, the intrinsically high variation in observational studies may require large numbers of samples to adequately address ecological questions. Therefore, field ecologists must consider the costs and benefits of increasing the quality and completeness of data (Gotelli & Colwell 2009). Methods based on species accumulation curves can estimate the sampling effort necessary to gather the total richness of a site by adjusting observed with expected species richness in relation to the number of individuals or samples collected (Gotelli & Colwell 2001, Field et al. 2004). However, although species accumulation curves and related methods usually prove straightforward to interpret, disadvantages include: (1) large numbers of individuals must be sampled to reach the asymptote (Chao et al. 2009); (2) higher richness and rarity of many species in the tropics makes estimation of sampling effort more difficult (Gotelli & Colwell 2009); (3) many studies are not only interested in alpha diversity (i.e., local species richness), but also in beta and gamma diversity, and related patterns (i.e., community and metacommunity composition); and (4) local species richness does not necessarily predict local features (e.g., species dominance) and regional characteristics (e.g., beta diversity) and thus species accumulation curves cannot be used to estimate these characteristics (Gotelli & Colwell 2009). As each study has its own specific goals, sampling design depends on which characteristic of communities the study aims to investigate. Thus, to analyze compositional dissimilarity, Received 21 October 2014; revision accepted 3 March 2015. 4

Corresponding author; e-mail: [email protected]

ª 2015 The Association for Tropical Biology and Conservation

sampling effort should be assessed based on a multivariate technique that considers information on compositional data, expressed as a site per species matrix. For example, Cao et al. (2002) used compositional data, classification strength analysis, and dissimilarity coefficients to identify the effect of sampling effort in distinguishing different communities and ecoregions. Here, we also used compositional data, but with a special focus on the potential of Procrustes analysis (Gower 1971) to define the minimum sampling effort required in a study. We aimed to optimize sampling effort to capture metacommunity structure. We used this pilot project to define the optimal sampling effort for subsequent studies. Procrustes analysis can estimate the degree of association between two ordination-based matrices (Peres-Neto & Jackson 2001) and is therefore appropriate to match the composition of a varying number of pooled subsamples with the complete dataset from the pilot study. Furthermore, because Procrustes analysis can be computed based on the match between ordinations generated with different dissimilarity measures, we also could test if differences in matrix association resulted from qualitative characteristics (change in species composition, e.g., Jaccard coefficient) or quantitative characteristics (change in species abundance, e.g., Bray-Curtis coefficient). We tested these ideas with observational data on abundance and distribution of macroinvertebrates, mostly identified to the genus level, from forested streams in a coastal basin in the State of S~ao Paulo (Southeast Brazil). We selected 13 streams with similar environmental characteristics—all minimally impacted low order streams—and collected 10 Surber samples per stream (30 cm 9 30 cm; total area = 90 m2; mesh size = 250 lm; hereafter called subsamples). We used Procrustes analysis to find the minimum number of subsamples per stream necessary to get proper information on the composition of the whole metacommunity (all 13 streams). To do that, we constructed several compositional matrices, based 399

400

Saito, Fonseca-Gessner, and Siqueira

on what was logistically feasible in terms of costs and time. This matrix included information from all 10 subsamples per stream pooled, resulting in a matrix of 13 rows (streams) 9 68 columns (genera). Subset matrices were constructed by using a subsample randomization protocol. For subset 1, we randomly assembled one subsample from each of the 13 streams, resulting in a compositional matrix with 13 rows. For subset 2, we randomly selected two subsamples per stream and pooled the species abundance of the two subsamples from each stream, resulting in a compositional matrix with 13 rows. We ran this procedure to create subsets based on one to nine Surber subsamples (i.e., subset 1 to subset 9). We repeated these procedures 1000 times to create 1000 subsets with one subsample, 1000 subsets with two subsamples, and so on up to nine. For each group of subsets (e.g., 1000 random subsets of one Surber subsample), we conducted Principal Coordinate Analysis (PCoA) with four different dissimilarity measures to encompass a gradient from an emphasis on species composition changes (Jaccard) to an emphasis on abundance changes (Bray-Curtis, modified Gower, and Manhattan, as suggested by Anderson et al. (2006) for comparisons of beta diversity). We then estimated the association between ordination patterns of the first four axes of these 1000 subsets to the ordination pattern (using PCoA) of the whole metacommunity matrix with Procrustes analysis. Procrustes analysis measures the strength of the relationship between multivariate datasets using a rotational-fit algorithm (Peres-Neto & Jackson 2001). Procrustes analysis produces an m²-statistic that can be transformed into an r-statistic for easier interpretation (r = square root of (1-m²)). The r-statistic can be interpreted simply as the match between ordinations (Peres-Neto & Jackson 2001, Lisboa et al. 2014). The Procrustes analysis based on Jaccard, Bray-Curtis, and modified Gower coefficients resulted in similar patterns, i.e., with a mean association value higher than r = 0.8 reached with five Surber subsamples. For the Manhattan coefficient, a mean value of r = 0.8 was reached with eight Surber subsamples. To visualize Procrustes results, we used histograms with r-values for each 1000 subsets using one to nine subsamples (Fig. 1). The Manhattan dissimilarity measure gave the most different values in comparison to Jaccard, modified Gower, and BrayCurtis dissimilarities (Fig. 1). This metric is the sum of absolute differences in abundance over all species (Faith et al. 1987), meaning that it gives more emphasis to changes in abundance. This likely explains its having the weakest association with the entire metacommunity matrix. Thus, to see changes in abundance among communities typically requires more subsamples per site. Deciding sampling effort can be crucial for the success or failure of empirical studies (Gotelli & Colwell 2001, Field et al. 2004). For example, one could collect too many subsamples per site and too few sites in space or time, and thus fail to observe the studied phenomenon. Alternately, one could collect too few subsamples per site so that community composition does not correlate with expected variables. Ferro and Melo (2011) used a similar approach with Procrustes analysis to analyze the effect of species richness on the similarity of composition among sites. As

data gathering always involves cost-benefit trade-offs, the amount of information needed from each community is a study-specific question. In our example, five Surber subsamples per stream gave r = 0.80 for three dissimilarity measures. Figure 1 shows that increasing to six or seven subsamples would not substantially improve the correlation (values around 0.80 and 0.90). This suggests that we could maintain the total number of Surber subsamples in the study by doubling the number of streams and halving the number of subsamples per stream (from 10 subsamples to five subsamples), making the spatial extent of the study larger. However, setting r = 0.80 was arbitrary, and the choice of cut-off values depends on study objectives and on cost-benefit trade-offs (Field et al. 2004). Our method allows the researcher to examine the whole distribution of r-values without assuming a cut-off value a priori, or even using alternative cutoffs for r-value for different purposes. Another issue to consider is the total number of subsamples used to define the overall target. We used 10 Surber subsamples based on what was logistically feasible for our research group when considering costs and labor involved in data processing. It is likely that 20 or 30 subsamples would lead to a different distribution of r-values, and, consequently, to a different number of optimal subsamples. Cao et al. (2002), however, showed that both similarity coefficients and the strength of site and group separation tended to stabilize after pooling 10 Surber samples—their analysis included a maximum of 20 pooled Surber samples. Thus, we are not recommending 10 subsamples as a default for sampling stream macroinvertebrates. On the contrary, we suggest Procrustes analysis for studies investigating variation in community composition optimize the experimental design, with due consideration to scientific objectives and financial constrains. Procrustes analysis is a powerful tool for ecologists, with good statistical power and flexible for different types of data (Dijksterhuis & Gower 1991, Peres-Neto & Jackson 2001, Lisboa et al. 2014). Here, we demonstrate another approach that could be used in three situations. First, one might invest more effort in sampling variation between sites once within site variation has been sufficiently subsampled. Second, this approach could be easily adapted to optimize the sampling of temporal variation in community composition. Finally, this approach could be used to compare studies with different sampling effort—different numbers of subsamples. In this case, one would need to rarefy communities to the smaller number of subsamples, reducing bias due to different sampling sizes. The approach outlined here is a useful tool for optimizing sample allocation to efficiently address a wide range of questions in community ecology.

ACKNOWLEDGMENTS We acknowledge Cristiane Umetsu, Jo~ao C. Nabout, Victor L. Landeiro and two anonymous reviewers for valuable comments on an early version of this manuscript. The writing of this study was partially funded by grants #2013/50424-1 and #2013/ 20540-0, S~ao Paulo Research Foundation (FAPESP) and by grant

Refining Sampling Effort with Procrustes

401

FIGURE 1. Histograms showing r-values of Procrustes analysis. Subsample subsets were compared with the total dataset using Jaccard, Bray-Curtis, modified Gower, and Manhattan dissimilarity coefficients.

402

Saito, Fonseca-Gessner, and Siqueira

#480933/2012-0, Conselho Nacional Cientıfico e Tecnologico (CNPq).

de

Desenvolvimento

LITERATURE CITED ANDERSON, M. J., K. E. ELLINGSEN, AND B. H. MCARDLE. 2006. Multivariate dispersion as a measure of beta diversity. Ecol. Lett. 9: 683–693. BONAR, S. A., J. S. FEHMI, AND N. MERCADO-SILVA. 2011. An overview of sampling issues in species diversity and abundance surveys. In A. E. Magurran, and B. J. McGill (Eds.). Biological Diversity: Frontiers in Measurement and Assessment, pp. 11–24. Oxford University Press, Oxford. CAO, Y., D. P. LARSEN, R. M. HUGHES, P. L. ANGERMEIER, AND T. M. PATTON. 2002. Sampling effort affects multivariate comparisons of stream assemblages. J. North Am. Benthol. Soc. 21: 701–714. CHAO, A., R. K. COLWELL, C.-W. LIN, AND N. J. GOTELLI. 2009. Sufficient sampling for asymptotic minimum species richness estimators. Ecology 90: 1125–1133. DIJKSTERHUIS, G. B., AND J. C. GOWER. 1991. The interpretation of generalized Procrustes analysis and allied methods. Food Qual. Prefer. 3: 67–87. FAITH, D. P., P. R. MINCHIN, AND L. BELBIN. 1987. Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69: 57–68.

FERRO, V. G., AND A. S. MELO. 2011. Diversity of tiger moths in a Neotropical hotspot: determinants of species composition and identification of biogeographic units. J. of Insect Conserv. 15: 643–651. FIELD, S. A., A. J. TYRE, N. JONZEN, J. R. RHODES, AND H. P. POSSINGHAM. 2004. Minimizing the cost of environmental management decisions by optimizing statistical thresholds. Ecol. Lett. 7: 669–675. GOTELLI, N. J., AND R. K. COLWELL. 2001. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecol. Lett. 4: 379–391. GOTELLI, N. J., AND R. K. COLWELL. 2009. Estimating species richness. In A. E. Magurran, and B. J. McGill (Eds.). Biological Diversity: Frontiers in Measurement and Assessment, pp. 39–54. Oxford University Press, Oxford. GOWER, J. C. 1971. A General Coefficient of Similarity and Some of Its Properties. Biometrics 27: 857–871. LISBOA, F. J. G., P. R. PERES-NETO, G. M. CHAER, E. D. C. JESUS, R. J. MITCHELL, S. J. CHAPMAN, AND R. L. L. BERBARA. 2014. Much beyond Mantel: bringing Procrustes association metric to the plant and soil ecologist’s toolbox. PLoS ONE 9: e101238. PERES-NETO, P. R., AND D. A. JACKSON. 2001. How well do multivariate data sets match ? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia 129: 169–178.

Suggest Documents