Generalizing visual fast count estimators for ... - ESA Journals

4 downloads 0 Views 867KB Size Report
Dec 8, 2015 - Citation: Barry, J., J. Eggleton, S. Ware, and M. Curtis. 2015. Generalizing visual fast count estimators for underwater video surveys. Ecosphere ...
Generalizing visual fast count estimators for underwater video surveys JON BARRY,  JACKIE EGGLETON, SUZANNE WARE,

AND

MATTHEW CURTIS

Centre for Environment, Fisheries and Aquaculture Science Lowestoft Laboratory, Pakefield Road, Lowestoft, Suffolk NR33 OHT United Kingdom Citation: Barry, J., J. Eggleton, S. Ware, and M. Curtis. 2015. Generalizing visual fast count estimators for underwater video surveys. Ecosphere 6(12):249. http://dx.doi.org/10.1890/ES15-00093.1

Abstract. A generalized Visual Fast Count (GVFC) method of moments estimator is proposed for estimating epifaunal species abundance from underwater video survey transects. This formalises and provides a statistical framework for previous ad-hoc Visual Fast Count methods. For a single transect, we derive the expected value of the naive GVFC estimator and use this to create the method of moments estimator, which has reduced bias. A maximum likelihood estimator for multiple transects is derived. For illustration, our methods are applied to a series of video trawls at Folkestone Pomerania in the Dover Straits, UK. Although our methods have been developed for marine applications, they could also be applied to some terrestrial transect surveys. Key words: abundance; Dover Straits, UK; maximum likelihood; transect surveys; underwater video surveys; Visual Fast Count. Received 14 February 2015; revised 11 May 2015; accepted 13 May 2015; published 8 December 2015. Corresponding Editor: D. P. C. Peters. Copyright: Ó Barry et al. 2015. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. http://creativecommons.org/licenses/by/3.0/   E-mail: [email protected]

INTRODUCTION

and monitoring of the condition of benthic habitats and their associated fauna. The Visual Fast Count (VFC) is a sub-sampling method that has been used to speed up counting of benthic assemblages from underwater video sampling. The standard method, first proposed by Kimmel (1985), works by dividing a video transect into segments. Initially, counts are made of individuals of all species present in the first segment. Counting then proceeds to the second segment. However, here, counts are made only of individuals from species not seen in the first segment. This process continues until the last segment such that once individuals from a species are counted in a segment that species is not counted again. Thus, the method produces only a single non-zero segment count for each species. So, for example, the counts for a particular species found in the third segment

There is an increasing recognition of the need for robust and consistent approaches for the acquisition and interpretation of seabed imagery data to support assessment and monitoring objectives under a number of current policy drivers (e.g., in the U.K., Common Standards Monitoring under the Habitats Directive and indicator development in support of the Marine Strategy Framework Directive). To effectively achieve the variety of objectives associated with the acquisition of data derived from underwater video and still images, these need to be processed and analyzed so that they provide a fully comprehensive and standardized output which is suitable for achieving all requirements. These requirements include marine habitat and biotope mapping, epifaunal community characterization v www.esajournals.org

1

December 2015 v Volume 6(12) v Article 249

BARRY ET AL.

might be 0, 0, 4. Barry and Coggan (2010) give examples of the application of the Visual Fast Count—several of these (e.g., Strong et al. 2006) are based on data from Strangford Lough, Northern Ireland, where there is concern about the effects of demersal fishing activities on seabed communities. Schobernd et al. (2014) evaluate various methods to determine fish counts using underwater videos. There are also examples in terrestrial ecology. For example, Freeman et al. (2003) describe the use of a similar method to VFC for estimating avian abundance indices. Barry and Coggan (2010) showed that the VFC method, on average, overestimates species’ densities, particularly for rare species. Essentially, this positive bias is because sampling for any particular species stops on a positive species count—i.e. the procedure does not constitute a proper random sample of segments within a transect. Barry and Coggan also came up with new estimators that produce better results than VFC. The best of these was a method of moments estimator based on the assumption that counts of a particular species follow a negative binomial distribution. Barry and Coggan also showed that VFC can improve efficiency in terms of the time taken to count creatures from video segments; in the two small trials that they conducted, they found that the efficiency factors (the time taken to count the whole transect divided by the time taken to do the VFC counting) were 3.2 and 2.6. The VFC method as it has been defined by previous authors is specific in its definition. Five segments have usually been considered, segments are examined from the beginning of the transect and counting stops for a particular species after a single count greater than zero is observed in any of the segments sampled. Some obvious generalizations are:

possible generalizations, we think that (1) and (3) are worth pursuing to generate a Generalized Video Fast Count (GVFC) method. However, we suggest that a threshold above zero would not be useful in practice and so we will fix it at t ¼ 0. In this paper we do the following: (1) Extend the work of Barry and Coggan (who derived an estimator only for the case d ¼ 1) by deriving a method of moments estimator (GVFCMOM) for the GVFC method. We demonstrate that this estimator has reduced bias. We provide R software in a freely available library to calculate this estimator. We note that this estimator can be useful for the situation in which a unique estimate of density is required for each transect; (2) Derive the maximum likelihood estimators for the situations in which a single estimate of species density is required over a whole region, and where segments within transects are essentially sampling points in that region; (3) Demonstrate the use of the GVFCMOM point estimator and maximum likelihood from a survey of Folkestone Pomerania, UK. All computing was done using the statistical package R (R Core Team 2014). The function GVFCMOM from the library emon (https:// r-forge.r-project.org/projects/emon/) was developed for this work and is used to calculate the GVFCMOM estimates.

METHODS Method of moments estimator Throughout this paper we use the following notation: s ¼ number of segments per transect, d ¼ defined number of positive segment counts before sampling stops, z ¼ number of zero segment counts, m ¼ number of non-zero segment counts (m  d), xj ¼ value of jth nonzero segment count, k ¼ expected value of the density per transect, l ¼ expected value of the density per segment ¼ k/s, n ¼ number of transects. Note that when deriving the GVFC and the GVFCMOM estimators below, we should have added a subscript t to k to denote that each

(1) The number of segments s; (2) The threshold t, above which the species is deemed to have been counted in that segment; (3) The number of segments d for which the count needs to be above the threshold before sampling stops. Thus, the standard VFC method as defined by Kimmel (1985) has s ¼ 5, t ¼ 0, d ¼ 1. Of these v www.esajournals.org

2

December 2015 v Volume 6(12) v Article 249

BARRY ET AL.

transect could have a unique expected density. However, this would make some of the formulae and explanation a little mathematically cumbersome and so we have omitted the subscript. When we later consider Maximum Likelihood estimation, we are assuming a homogeneous k across the survey area and so no subscript is needed. For a particular species, the naive GVFC estimator for the abundance of species per segment is calculated as the total number of individuals in the m segments in which the species was observed divided by the number of segments considered. The estimator can be scaled up to estimate the density over the whole transect by multiplying the estimator by the number of segments s. We can write this mathematically as s kˆ ¼ GVFCðs; dÞ ¼

m X

To generate the general expectation, it is instructive to begin with the expectation for low values of d. First consider d ¼ 1. From Eq. 1 and Eq. 3 and because the probability of a positive count is (1  p0), the contribution to the overall expectation of the GVFC estimator from situations where there is a positive count in the first segment is k. The contribution to the expectation for situations where counting stops in the second segment is kp0/2. Carrying on this process over all s segments yields the expectation s X E½GVFCðs; d ¼ 1Þ ¼ k pi1 0 =i:

Thus, for example, when s ¼ 5 and d ¼ 1 we have: E½GVFCðs ¼ 5; d ¼ 1Þ ¼ kð1 þ p0 =2 þ p20 =3 ð5Þ þ p30 =4 þ p40 =5Þ:

xj

j¼1

mþz

:

Now consider the case for when d ¼ 2. The first time counting can stop is in the second segment, when both the first and second segments are positive. The contribution to the expectation here is k(1  p0). If counting stops in the third segment, then the contribution to the expectation has to take into account the fact that the first positive can occur in either position 1 or 2 (the second positive is fixed at position 3), yielding the expectation 4k(1  p0)p0/3. The same arguments apply up to segment s. However, we also have to consider an additional contribution from the situation where only one positive is achieved in the s segments. This positive could occur in any of the s segments and so the contribution to the expectation is kp0s1. For example, including this term when s ¼ 5, gives the overall expectation

ð1Þ

Barry and Coggan suggested a method of moments estimator derived from equating the expectation of the VFC estimator with the sample estimate, and then solving numerically for k. We adopt a similar method here, but for the generalized approach. Denote p0 as the probability of zero individuals in a segment and assume that the numbers per segments are independent of the numbers in other segments and that counts per segment are made without error. To calculate the expected value of VFC(s,d), we need to take expected values over the possible combinations of counts considered and then over the values of those counts xj (call this expectation E2, noting that the counts are themselves random variables). To illustrate the first of these expectation steps for the trivial case with s ¼ 2 and d ¼ 1, and denoting a positive count as 1 and a zero count as 0, we would have E½VFCðs; dÞ ¼ E2 ½VFCð1Þprð1Þ þ VFCð01Þprð01Þ þ VFCð00Þprð00Þ:

¼

v www.esajournals.org

kp40

E½GVFCðs ¼ 5; d ¼ 2Þ þ ð1  p0 Þkð1 þ 4p0 =3 þ 6p20 =4 þ 8p30 =5Þ: ð6Þ

If we go through similar arguments to those above for d . 2, we can generate the expectation for GVFC estimates as a function of s and d. Throughout, we need to remember that when m ¼ d (i.e., when d positive transects are found), the last positive must occur on the last segment. The expectation is given by:

ð2Þ

Note also that the expected value of the count in a segment given that the count is constrained to be positive is E½Xj jXj  0 ¼ k=½sð1  p0 Þ:

ð4Þ

i¼1

ð3Þ

3

December 2015 v Volume 6(12) v Article 249

BARRY ET AL.

 i1 =i d1 i¼d   d 1 X i1 si s þ ðk=sÞ ið1  p0 Þ p0 i i¼1 s X

E½GVFCðs; dÞ ¼ ð1  p0 Þd1 dk



C R

pid 0



make the assumption of homogeneity—that there is the same underlying expected density over the whole area and thus that the expected density for each segment is l, irrespective of which transect it is contained in. We also assume that all observations are independent. These are clearly strong assumptions and we discuss them further in the discussion. We derive ML estimators for both the Poisson and the Negative Binomial distributions, and briefly outline these below. Assume that a species is random and independently distributed in space so that the number of animals per segment has a Poisson distribution. The likelihood over all transects is then

ð7Þ



where ¼ C!=½R!ðC  RÞ!is the choose operator. The expectation in Eq. 7 allows us to create a generalized method of moments estimator (GVFCMOM) in a similar way to Barry and Coggan (2010), who produced an estimator for the case d ¼ 1. We simply equate the observed GVFC estimate in Eq. 1 with its expectation in ˆ This Eq. 7 and solve for k, giving the estimator k. requires use of a numerical optimiser because p0 (see below) is a non-linear function of k. The form of p0 is dependent on one’s belief about the underlying distribution of the animals whose density is to be estimated. If we assume that they are random uniformly distributed, then the number of individuals per segment will have a Poisson distribution. Thus, p0 ¼ exp(k/s) ¼ exp(l). If individuals have a more clustered distribution then we can use the Negative Binomial distribution to model the count per segment. For this distribution, p0 ¼ [1þ k/(sk)]k ¼ [1þl/k]k, where k determines the amount of clustering (low k implies high clustering). Barry and Coggan found that k ¼ 1 worked well in their simulation studies. One way to obtain standard errors or confidence intervals for the GVFCMOM estimator is by parametric bootstrapping (Manly 2006). This involves repeat simulation of segment counts using the assumed distribution (i.e., Poisson or N e g a t i ve B i n o m i a l ) a n d t h e o b s e r ve d GVFCMOM estimate for the mean. Standard errors and confidence intervals are obtained from the vector of, say, 1000 bootstrap GVFCMOM estimates from these simulated data.

t¼1

ð8Þ

where zt is the number of zero segment counts in the tth transect, mt is the number of positive segment counts in the tth transect, and xtj is the jth positive segment count within the tth transect. By differentiating the ln-likelihood with respect to l and setting to zero, this yields the maximum likelihood estimator mt n X X

xtj

t¼1 j¼1

lˆ ¼ X n

mt þ

t¼1

n X

ð9Þ zt

t¼1

and hence ˆ kˆ ¼ sl:

ð10Þ

Note that lˆ is the mean of all the segment counts. The standard error is calculated by taking the inverse of the second derivative with respect to l. This gives sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mt n X X xtj t¼1 j¼1 ˆ seðkÞ ¼ s X ð11Þ n n X mt þ zt

Maximum likelihood estimation

t¼1

When creating a combined estimator over multiple transects, it is useful to embed the GVFC data in the well-understood statistical estimation procedure of maximum likelihood (ML). This has, for example, the benefit of giving simple formulae for standard errors. A similar approach was taken by Freeman et al. (2003). We v www.esajournals.org

lxtj expðlÞ j¼1 xtj ! mt

n

Lðl; x; z; mÞ ¼ P expðzt lÞ P

t¼1

It can be seen that kˆ in Eq. 10 for multiple transects is essentially the same as the naive GVFC estimator in Eq. 1 for a single transect. A similar maximum likelihood approach can be used for the Negative Binomial distribution. The likelihood is given by: 4

December 2015 v Volume 6(12) v Article 249

BARRY ET AL.

100[E(GVFC)  k]/k. Assuming that individuals have a random spatial distribution, and so p0 ¼ exp(k), this % bias is plotted in Fig. 1 as a function of the mean k for d ¼ 1 to 3 (black lines). We can see that the bias reduces strongly as d and k increase. Comparing the grey (GVCMOM) and the black (GVFC) lines in Fig. 1 shows how the GVFCMOM estimator has reduced bias. This difference is clear from Fig. 1 for d ¼ 1 and d ¼ 2; however, the GVFC bias is small for d ¼ 3 and so there is little room for improvement for the GVFCMOM estimator. For the Folkestone Pomerania data, animal density estimates were standardized so that they were in numbers per 100 m2. Trawl speed was set to be 0.5 knots. This equates to 15.4333 m per minute. Given that the video window was approximately 0.93 m, to convert densities per transect to densities per 100 m2, we multiply by the constant C ¼ 6.987/s, where s is the number of segments on that transect. Note that the video camera is angled obliquely (not facing straight down) so the field of view varies slightly across the image. In shallow coastal waters, Asterias rubens sometimes occurs in dense aggregations of up to 100 specimens/m2 (Sloan 1980). Dare (1982) reported an aggregation of Asterias rubens in Morecambe Bay, UK. Thus, in terms of its spatial pattern, we have assumed that Asterias rubens has a clustered distribution and so counts are more likely to follow the Negative Binomial distribution. The value of k ¼ 0.42 used for the GVFCMOM estimator was obtained from the maximum likelihood estimate. The maximum likelihood estimate for the density per 100 m2 over the whole regions was 3.08, with 95% confidence interval (1.89, 4.27). The point estimates for the individual tows are illustrated in Fig. 2, where the area of the circle represents the GVFCMOM estimate. It is difficult to pick out strong patterns in density across the site, although density is perhaps lowest in the middle section.

ð1 þ lk Þðzt þmt Þk mt Cðk þ xtj Þ P t¼1 j¼1 xtj ! ½CðkÞmt l xtj ð1 þ Þ : ð12Þ k n

Lðl; k; x; z; mÞ ¼ P

Proceeding as before, the maximum likelihood estimator for l is the same as in Eq. 9—i.e., the sample mean. Thus, the estimator for k is as shown in Eq. 10. The standard error is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u ˆ ^ ˆ ˆ ¼ su u n lðk þ lÞ seðkÞ : ð13Þ n uX X tð ^ mt þ zt Þk t¼1

t¼1

For both the Poisson and Negative Binomial likelihoods, the R function fitdistr() in library MASS can be used to obtain an estimate and standard error for l.

Folkestone Pomerania survey A Defra-commissioned survey was carried out in the Folkestone Pomerania by the Centre for Environment, Fisheries and Aquaculture Science (Cefas) to increase the evidence base to support proposals for designation of a Marine Conservation Zone (MCZ). The resultant MCZ is an area of around 26 km2, located in the narrowest part of the Dover Straits, UK. This site is characterized by several large depressions in the seabed, falling from around 22 m to 30 m with exposed rock ledges and flat or gently sloping boulder-strewn platforms at the upper edges. A full description of the survey results is given in Whomersley et al. (2012). However, here we use a subset of data from this survey to illustrate the concepts of GVFC estimation described above. We focus on video tows for the species Asterias rubens from 21 sites. Data is counts on 1minute video segments. Tow lengths per transect were 10 minutes except for sites C11 and R06 (9 minutes) and C26 (7 minutes). Although the full data set was available, we include only data for a GVFC estimator with d ¼ 2 (i.e., the first two positive counts in a transect).

RESULTS DISCUSSION The generalized expectation in Eq. 7 was used to calculate the bias of the GVFC and GVFCMOM estimators. The bias was defined as a percentage of k, given by: % bias ¼ v www.esajournals.org

It has previously been shown that the VFC method can save considerable time when processing underwater videos (Barry and Coggan 5

December 2015 v Volume 6(12) v Article 249

BARRY ET AL.

Fig. 1. Bias of GVFC estimators for d ¼ 1, 2, 3 (black lines) and their equivalent method of moments estimators (grey lines).

2010). However, there remains the problem of positively biased results from the uncorrected VFC method. Our view is that the correct approach depends on what you want from your survey. Two likely purposes for the survey are as follows. (1) A series of spatial point estimates of species density (or relative density) are needed. These could then be used, for example, to map species density over the area. If you wanted to use the GVFC method to save processing time then you should use the GVFCMOM estimator with either d ¼ 1 or d ¼ 2, depending on the relative importance of reduced time (low d) or reduced bias (high d). (2) A single estimate of the species density over a survey area is needed. The maximum likelihood estimator can then be used, which involves v www.esajournals.org

pooling information from all of the visual fast counts. This estimator will have reduced processing time compared to counting each segment (the reduction will depend on the value of d used) but, assuming homogeneity of abundance, will have negligible bias as the number of transects increases. We need to be clear, however, on the assumptions that are implicit in our approach. For the GVFC point estimates, we assume independence between segment counts. If animals are random uniformly distributed, then this assumption will hold true. However, if the location of animals arises from some sort of cluster process (Diggle 2013) then this could induce correlations between the counts. Work is needed, using simulated data or based on full transect counts from real data, to assess the robustness of our methods or to 6

December 2015 v Volume 6(12) v Article 249

BARRY ET AL.

Fig. 2. Map showing location of Folkestone Pomerania and GVFCMOM point estimates for each tow. The area of the circle represents the magnitude of the estimate. The three zero estimates are represented by black dots.

generate new, more sophisticated methods. As stated above, our maximum likelihood estimator assumes homogeneity of density across the survey area. Our suggested way to deal with this in the current paper is to use GVFCMOM point estimates. Another approach would be to incorporate transect random effects into the likelihood together with information on external environmental variables that could be influencing abundance. However, if we are to avoid bias, our view is that the GVFC sampling approach v www.esajournals.org

would have to be built into the construction of the likelihood. Clearly, further work is needed on such methods and also to explore the robustness of the simple maximum likelihood estimator to departures from homogeneity. Similar work has been carried out by Conn et al. (2015). Essentially, when we calculate the maximum likelihood estimator from multiple transects, we can think of the situation as stacking all the data from the individual transects into a single string of data. Assuming homogeneity, the bias for the 7

December 2015 v Volume 6(12) v Article 249

BARRY ET AL. survey counts. Ecological Monographs 85:235–252. Dare, P. J. 1982. Notes on the swarming behaviour and population density of Asterias rubens L. (Echinodermata: Asteroidea) feeding on the mussel Mytilus edulis. Journal du Conseil Permanent International pour l’Exploration de la Mer 40:112–118. Diggle, P. 2013. Statistical analysis of spatial and spatio-temporal point patterns: Third edition. Chapman and Hall, London, UK. Freeman, N. F., D. E. Pomeroy, and H. Tushabe. 2003. On the use of timed species counts to estimate avian abundance indices in species-rich communities. African Journal of Ecology 41:337–348. Kimmel, J. J. 1985. A new species-time method for visual assessment of fishes and its comparison with established methods. Environmental Biology of Fishes 12:23–32. Manly, B. F. J. 2006. Randomization, bootstrap and Monte Carlo methods in biology: Third edition. Chapman and Hall, London, UK. R Core Team. 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Schobernd, Z. H., N. M. Bacheler, and P. B. Conn. 2014. Evaluating the utility of alternative video monitoring metrics for indexing reef fish abundance. Canadian Journal of Fisheries and Aquatic Sciences 71:464–471. Sloan, N. A. 1980. Aspects of the feeding biology of Asteroids. Oceanography and Marine Biology: an Annual Review 18:57–124. Strong, J. A., M. Service, and A. J. Mitchell. 2006. Application of the visual fast count for the quantification of temperate epibenthic communities from video footage. Journal of the Marine Biological Association of the United Kingdom 86:939–945. Whomersley, P., J. Rance, K. Vanstaen, and A. Callaway. 2012. Folkestone Pomerania rMCZ Survey Report. Cefas report ref FP/13/B/06-12, Lowestoft, UK.

estimate of k becomes less as the number of transects increases. To see this, imagine that you are performing a GVFC operation with d ¼ 1 (i.e., you stop counting segments of a transect for a species once you find a single, positive segment count for that species). If you combine such data from n transects into a single data string then, assuming that the species was detected in each transect, you effectively have a GVFC data string with d ¼ t. We have illustrated in Fig. 1 what we might predict intuitively—that the bias of the naı¨ve VFC estimator for estimating k reduces as the value of d increases. Hence, because maximum likelihood and the naı¨ve GVFC estimators are identical, the bias of the maximum likelihood estimator reduces as the number of transects increases. Underwater video provides data on both physical habitat characteristics and the associated epibenthic fauna and can be used where grab sampling is not suitable. So far, over 400 hours of video has been collected to support the designation of Marine Protected Areas in the UK. The analysis of this video can be both timely and costly and GVFC would help to reduce both of these. Clearly, however, the final decision as to whether to use GVFC depends on the balance between cost and precision.

LITERATURE CITED Barry, J., and R. Coggan. 2010. The Visual Fast Count method: a critical examination and development for underwater video sampling. Aquatic Biology 11:101–112. Conn, P. B., D. S. Johnson, J. M. Ver Hoef, M. B. Hooten, J. L. London, and P. L. Boveng. 2015. Using spatio-temporal statistical models to estimate animal abundance and infer ecological dynamics from

SUPPLEMENTAL MATERIAL ECOLOGICAL ARCHIVES The Supplement is available online: http://dx.doi.org/10.1890/ES15-00093.1.sm

v www.esajournals.org

8

December 2015 v Volume 6(12) v Article 249

Suggest Documents