Erroneous behaviour of MixSIR, a recently published ...

60 downloads 0 Views 178KB Size Report
Phillips & Gregg 2003), represented a step-change in ecological research and provided a critical tool for exploring a number of new systems with greater power ...
Ecology Letters, (2009) 12: E1–E5

TECHNICAL COMMENT

1

Andrew L. Jackson, * Richard Inger,2 Stuart Bearhop2 and Andrew Parnell3 1

Department of Zoology, School

of Natural Sciences, Trinity College Dublin, Dublin 2, Ireland 2

Centre for Ecology &

Conservation, School of Biosciences, University of Exeter, Cornwall Campus, Penryn, Cornwall, TR10 9EZ, UK

doi: 10.1111/j.1461-0248.2008.01233.x

Erroneous behaviour of MixSIR, a recently published Bayesian isotope mixing model: a discussion of Moore & Semmens (2008) Abstract The application of Bayesian methods to stable isotopic mixing problems, including inference of diet has the potential to revolutionise ecological research. Using simulated data we show that a recently published model MixSIR fails to correctly identify the true underlying dietary proportions more than 50% of the time and fails with increasing frequency as additional unquantified error is added. While the source of the fundamental failure remains elusive, mitigating solutions are suggested for dealing with additional unquantified variation. Moreover, MixSIR uses a formulation for a prior distribution that results in an opaque and unintuitive covariance structure. Keywords Bayesian statistics, model, stable isotope analysis.

3

Department of Statistics,

School of Computer Science and Statistics, Trinity College Dublin, Dublin 2, Ireland *Correspondence: E-mail: [email protected]

Ecology Letters (2009) 12: E1–E5

INTRODUCTION

Stable isotope mixing models have become a key component of the ecologists tool kit, have proved particularly effective in dietary studies (Inger et al. 2006), and have the potential to provide a powerful means to explore community structure (Bearhop et al. 2004; Layman et al. 2007; Inger & Bearhop 2008). Recent publication of the MixSIR Bayesian method (Moore & Semmens 2008) represents a natural progression from previous stable isotope analysis packages employed to analyse dietary data. The isotopic mixing model, on which MixSIR is based ( Isosource: Phillips & Gregg 2003), represented a step-change in ecological research and provided a critical tool for exploring a number of new systems with greater power than was previously possible. However, it demanded that only a single average value for each component of the model be provided, discarding important sources of variability within a system and produced pseudo-distributions of possible parameter combinations making it difficult to draw conclusions about their ecological significance. MixSIR on the other hand offers to take all quantifiable uncertainty in the system and propagate it through to give a reliable estimate

of the dietary composition of consumers. However, close analysis of MixSIRÕs performance reveals an alarmingly high failure rate to correctly identify dietary compositions in a highly controlled and simple simulated system. Bayesian formulation remains the logical and most powerful method to address this system. The failure rate in MixSIR is likely due to a coding error and easily implemented changes to the model formulation will give it more power and reliability. SIMULATING ISOTOPE DATA

We define a simple system where a set of N consumers reflect the isotopic signature of their K food sources in proportion to the amount of each source in their diet: Xij ¼

K X

pk ðSjk þ cjk Þ þ eij

ð1Þ

k¼1

Sjk  N ðljk ; x2jk Þ

ð2Þ

cjk  N ðkjk ; s2jk Þ

ð3Þ

eij  N ð0; r2j Þ

ð4Þ  2009 Blackwell Publishing Ltd/CNRS

E2 A. L. Jackson et al.

Technical Comment

Where, Xij = the observed isotope value j of the consumer i, such that i = 1,…, N individual observations; j = 1,…, J isotopes. Sjk = source value k on isotope j, such that k = 1,…, K; normally distributed with mean l and standard deviation x. pk = dietary proportion of source k; to be estimated by the model. cjk = trophic enrichment factor for isotope j on food source k; normally distributed with mean k and standard deviation s. eij = residual error representing additional unquantified variation between individual; normally distributed with mean = 0 and standard deviation r. When there is no additional residual error (e = 0) then the data conform exactly to the model structure defined for MixSIR and we would expect MixSIR to determine the dietary proportions given all the other data. However, in reality there is likely to be additional variation between individual observations beyond that specified on trophic corrections, isotopic variation and diet such as differences in physiology or pathology between individuals. We include this added realism by adding the e term to MixSIRÕs original formulation so that when r2 > 0 then there is additional unspecified error. Such error will likely create problems in the fitting process. We test how sensitive MixSIR is to this breach of the model assumptions.

We illustrate MixSIRÕs failings on a single data architecture as the within-architecture variation in performance is so high as to render a fully randomised trial difficult to interpret. Two exemplary simulated data sets for r = 0 and r = 1 are shown in Fig. 1. Needless to say this data architecture should be a straight forward task for any mixing model, since a three source, two isotope problem would be analytically tractable were there no variation, and what variation is present is relatively small – particularly in Fig. 1a. Additional simulations on alternative architectures (data not shown) yield similarly poor fitting results. We generated 200 replicate datasets as above for each case of four values of r = (0, 0.1, 0.5, 1). The source code version of MixSIR was run within Matlab ( The Mathworks (a)

GENERATING SIMULATED DATA

Simulation offers the only truly reliable method to acquire data on which to test the performance of statistical models as the full behaviour of the data can be specified a priori to conform to the model and further, to break the assumptions of the model in a systematic manner. The modelÕs predictions of key parameters can then be compared with the specified values which are fully known a priori. Using eqns 1–4 we generated n = 10 independent data points based on a system comprising three food sources whose two distinct and independent isotopic values form the points of an equilateral triangle ( Table 1) which are described by means (l) and associated standard deviation x = 0.5 across all sources and isotopes. Without loss of generality, we set all the correction factors to zero (cjk = 0) in all simulations. The dietary proportions were fixed as p1 = 0.7, p2 = 0.2 and p3 = 0.1 for all simulations.

(b)

Table 1 The mean isotopic ratios (l) used in the analysis of

MixSIR for each of the three food sources on two isotopes Source

Isotope 1

Isotope 2

Figure 1 Exemplary data as used in the replicated simulations

1 2 3

)10 0 10

0 17.32 0

generated according to the algorithm described in eqns 1–4 for r = 0 (a) and r = 1 (b). The true dietary proportions are p1 = 0.7, p2 = 0.2 and p3 = 0.1. Sources as labelled, consumer values as open circles.

 2009 Blackwell Publishing Ltd/CNRS

Technical Comment

Erroneous behaviour of MixSIR E3

2007) as the compiled executable version does not facilitate easy translation into command line automated form. Some minor recoding was required to facilitate automated data generation and to allow the program to be distributed across multiple computers using the Distributed Computing Toolbox for computational speed (see Appendix S1). MixSIR was run for 105 iterations with 103 re-samples per iteration. We report the proportion of replicates whose 95% credible intervals of the estimated dietary proportions contained the true values (Table 2). Table 3 shows that MixSIR tends to over estimate the contribution from the largest dietary proportion and underestimate the smallest (ideally we would expect 50% of estimated values to be either side of the true value). Additionally we show the actual simulation results for the 200 replicates for values of r = 0 and 1 (Fig. 2) in order to highlight the tendency for Table 2 Percentage of estimates of pk from replicate simulations (200) whose 95% credible intervals contained the true underlying dietary proportion for a given food source (p1 = 0.7, p2 = 0.2 and p3 = 0.1)

Dietary proportion Standard deviation of residual error (r)

p1

p2

p3

0 0.1 0.5 1.0

43 44 32 15

43 35 24 17

41 45 29 21

Note that for r = 1, MixSIR crashed on 40 simulations due to an inability to fit the model, and the figures in the table represent the proportion of successful instances of MixSIR (i.e. out of 160 simulations in this case). Table 3 Percentage of estimates of pk from replicate simulations (200) whose median estimates where greater than the true underlying dietary proportion for a given food source ( p1 = 0.7, p2 = 0.2 and p3 = 0.1)

Dietary proportion Standard deviation of residual error (r)

p1

p2

p3

0 0.1 0.5 1.0

61* 63* 67* 78*

42* 41* 43* 40*

44 44 42* 35*

Note that for r = 1, MixSIR crashed on 40 simulations due to an inability to fit the model, and the figures in the table represent the proportion of successful instances of MixSIR (i.e. out of 160 simulations). All values in the table marked with * deviate significantly from the expected value of 50% based on a chi-squared test with probability of the test < 0.05.

the spread in the estimated values to generally increase with residual error (c.f. left- and right-hand panels). THE ABILITY OF MIXSIR TO ESTIMATE DIETARY PROPORTIONS

Even when the simulated data conformed exactly to the mathematical formulation of MixSIR (r = 0), the model only successfully identified the true underlying dietary proportions, at best 43% of the time: this is a highly surprising rate of failure. Indeed, we reformulated our own implementation of MixSIR in the R statistical program (R Development Core Team 2007) and found a much more acceptable rate of success of approx 95% (data not shown). However, our implementation used MCMC fitting process rather than Sample Importance Resampling (SIR) as employed in MixSIR. It therefore remains open as to whether the core failings of MixSIR result from a coding error or inherent problems associated with the fitting process (certainly, our reformulation using MCMC suggests that the crashing of MixSIR in simulations when r = 1 is likely a result of the SIR fitting process). This core failing aside, there are two further trends associated with MixSIR that remain to be explained and persist in our MCMC reformulation and have sound theoretical reasons underlying them. First, as additional unquantified error is increased in the data, MixSIR becomes less reliable in its estimate (successfully estimating the proportion of the largest dietary component on only 15% of successful implementations when r = 1: Table 2). Such a trend is to be expected as there is nowhere in the model for this variation to occur. Second, MixSIR consistently overrepresents the largest component of the diet, and inflates its importance as the additional residual error increases (Table 3). We believe that in order for MixSIR to accommodate data with larger extra variation not accounted for explicitly in the model, it forces the variation to be explained by the distribution specifying the dietary proportions. It appears that this is achieved by inflating one of the dietary proportions at the expense of the others – Table 3 and Fig. 1. However, it is difficult to verify this mathematically as the awkward formulation of the prior distributions (see below) makes this task complicated. We suggest that including a residual error term in the model itself and estimating both the dietary proportions ( pj) and the variance of the residual error (r2) would alleviate both issues simultaneously to a large extent. Obviously, with additional error beyond that specified with known parameters, model predictions will always be less certain, but including a residual error term will mitigate the extent of this effect. Our own Bayesian mixing model SIAR (Parnell et al. 2008) includes this term and appears to cope better with additional unquantified variation.  2009 Blackwell Publishing Ltd/CNRS

E4 A. L. Jackson et al.

Technical Comment

(a)

(b)

(c)

Figure 2 The ability of MixSIR to recreate

the known dietary proportion of each of the three food sources when r = 0 and r = 1. Errorbars represent the 95% credible intervals of the estimated values. Note that in the right-hand column, when r = 1, MixSIR failed to produce any estimate on 40 out of the 200 simulations: these data points have been omitted. (a) Proportion of source 1 in the diet: true value p1 = 0.7, (b) Proportion of source 2 in the diet: true value p2 = 0.2, (c) Proportion of source 3 in the diet: true value p3 = 0.1.

Another important concern with MixSIR is the manner in which prior information is incorporated into the model. The problem here is that specifying prior proportions for each source on independent beta distributions is misleading since they will be rescaled to sum to one before being incorporated into the model. That is, the intended characters and shape of the input distributions will not be conserved in a transparent manner.  2009 Blackwell Publishing Ltd/CNRS

A more logical prior distribution is the Dirichlet (Evans et al. 2000) as employed in SIAR. Such a formulation allows users to specify mean proportions (that sum to unity) for each dietary source and a standard deviation for the first of these proportions; which it uses to generate K a values. The Dirichlet prior does not allow the user to specify individual uncertainties for each proportion, but the prior as input does match exactly what the model

Technical Comment

Erroneous behaviour of MixSIR E5

receives and uses it to draw consistent proportions. The actual means and variances for the generated marginal distributions (Evans et al. 2000) can be then explored by defining: aT ¼

K X

ð5Þ

ak

k¼1

MixSIR is a promising start and further theoretical work is required to develop a range of analytical techniques for analysing isotope data in a fully Bayesian framework now that researchers can have faith in the estimation of dietary proportions and their associated uncertainties. The problems outlined here are easily rectified and an extremely useful quantitative tool will result.

and then the characteristics of the distribution are given by: pk ¼ ak =aT varðpk Þ ¼ ak ðaT 

ð6Þ ak Þ=ða2T ðaT

þ 1ÞÞ

covðpk ; pw Þ ¼ ak aw =ða2T ðaT þ 1ÞÞ

ð7Þ ð8Þ

It is the covariance structure of MixSIRÕs formulation of the prior in particular that is the greatest cause for concern. The Ôsum to unityÕ restriction imposes a negative correlation between the dietary proportions. This correlation is transparent and well-understood in the Dirichlet scenario. The act of combining Beta distributions together (MixSIR) creates an unspecified and unknown covariance structure which may not be appropriate for estimating dietary proportions and their correlations. As can been seen from eqn 7, large variance is achieved in a Dirichlet distribution when one of the a values is large in relation to the others, which corresponds with an inflation in the dietary proportion associated with that value (eqn 6). Such behaviour suggests a mathematical basis for tendency towards inflation of the largest dietary proportion by MixSIR in order to maximise the variation in the model associated with increasing unincorporated residual error, but we cannot be certain given their Beta formulation. CONCLUSIONS

In summary, MixSIR offers a huge step forward in the application of isotope mixing models in ecological research. However, it fails to accurately recreate even mathematically simple systems and suffers from a lack of a residual term commonplace in regression models. Although we suspect a coding error somewhere in the model as the main source of error we cannot rule out that this behaviour results from the choice of fitting algorithm (SIR). We offer suggestions for including a residual error term in order to alleviate some of the poor estimation associated with cases where additional unquantified error is present (one might imagine this to be common place in real world examples where all error sources are rarely known with complete accuracy). Further, we suggest moving to a Dirichlet distribution for specifying the prior dietary proportion distributions for transparency, rather than the mathematically unclear combined Beta distributions currently employed in MixSIR.

ACKNOWLEDGEMENTS

We thank two referees and an editor on previous versions of this manuscript for useful comments to improve it. REFERENCES Bearhop, S., Adams, C.E., Waldron, S., Fuller, R.A. & Macleod, H. (2004). Determining trophic niche width: a novel approach using stable isotope analysis. J. Anim. Ecol., 73, 1007–1012. Evans, M., Hastings, N. & Peacock, B. (2000). Statistical Distributions, 3rd edn. John Wiley & Sons, New York. Inger, R. & Bearhop, S. (2008). Applications of stable isotope analyses to avian ecology. Ibis, 150, 447–461. Inger, R., Ruxton, G.D., Newton, J., Colhoun, K., Mackie, K., Robinson, J.A. et al. (2006). Using daily ration models and stable isotope analysis to predict biomass depletion by herbivores. J. Appl. Ecol., 43, 1022–1030. Layman, C.A., Arrington, D.A., Montana, C.G. & Post, D.M. (2007). Can stable isotope ratios provide for community-wide measures of trophic structure? Ecology, 88, 42–48. Moore, J.W. & Semmens, B.X. (2008). Incorporating uncertainty and prior information into stable isotope mixing models. Ecol. Lett., 11, 470–480 Parnell, A., Inger, R., Bearhop, S. & Jackson, A.L. (2008) SIAR: Stable Isotope Analysis in R. http://cran.r-project.org/web/packages/siar/index.html Phillips, D.L. & Gregg, J.W. (2003). Source partitioning using stable isotopes: coping with too many sources. Oecologia, 136, 261–269. R Development Core Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org The Mathworks (2007). Matlab (R2007b). SUPPORTING INFORMATION

Additional Supporting Information may be found in the online version of this article. Appendix S1 Matlab code. Please note: Blackwell publishing are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. Editor, Tim Wootton Manuscript received 4 July 2008 Manuscript accepted 10 July 2008  2009 Blackwell Publishing Ltd/CNRS

Suggest Documents