Weak identifiability in mark-recapture and mark-recovery models

3 downloads 354 Views 245KB Size Report
Dec 12, 2005 - of weak identifiability, is evaluated for a range of mark-recapture-recovery data, for which the parameter-redundancy status is established.
Weak identifiability in mark-recapture and mark-recovery models O. Gimenez,1,2 S.P. Brooks3 and B.J.T. Morgan1,∗ 1

Institute of Mathematics, Statistics and Actuarial Science University of Kent, Canterbury Kent, CT2 7NF - UK 2

Centre d’Ecologie Fonctionnelle et Evolutive - CNRS 1919 Route de Mende 34293 Montpellier Cedex 5 - France 3

Statistical Laboratory

University of Cambridge, Wilberforce Road Cambridge, CB3 0WB - UK December 12, 2005

Summary. We review essential ideas of parameter-redundancy and model identifiability, in both Bayesian and classical contexts. The 35% guideline for overlap between prior and posterior distributions, suggested as an indicator of weak identifiability, is evaluated for a range of mark-recapture-recovery data, for which the parameter-redundancy status is established. The percentage overlap between prior and posterior distributions is obtained easily from the output of MCMC samplers. It is shown that as long as uniform prior distributions are adopted for all of the model parameters then the sug∗

email: [email protected] 1

gested guideline works well. It is recommended for use in this area which is seeing increased application of Bayesian methods. Key words: Bayesian identifiability; Conditional non-identifiability; Marginal non-identifiability; Mark-recapture-recovery models; Parameter-redundancy; Prior/posterior overlap; Survival of wild animals. 1.

Introduction

In the last forty years, a challenging and important research topic in biometry has been the estimation of wild animal survival. This usually relies on the monitoring of individuals via longitudinal studies. Individuals are marked on a series of sampling occasions, and then on subsequent occasions it is either recorded whether a marked animal is seen or not, which gives rise to capture-mark-recapture data (Lebreton et al., 1992) or which animals are dead by recovering their marks, which gives rise to capture-mark-recovery data (Brownie et al., 1985). The two protocols sometimes occur together (see for example, Catchpole et al., 1998), and in addition animals may be re-sighted as well as, or instead of, being recaptured. The likelihoods constructed from probability models for recaptures and recoveries may share the same product-multinomial structure, and may be maximised by user-friendly computer programs such as MARK (White and Burnham, 1999) or M-SURGE (Choquet et al., 2005) in order to produce classical maximum-likelihood estimates. However models can be proposed for the data without all the parameters being estimable, a phenomenon which is usually referred to as nonidentifiability. More formally, a model is defined to be identifiable if no two values of the parameters give the same likelihood for the data; over2

parameterized models result in likelihood surfaces with completely flat ridges or surfaces. So-called redundant models can be re-expressed as a function of fewer than the original number of parameters (Catchpole and Morgan, 1997). The obvious way to check for parameter redundancy for a particular application is to examine the likelihood surface by computing the rank of the observed Hessian (Viallefont et al., 1998; Formann, 2003). Catchpole and Morgan (1997) considered the rank of the model itself, regardless of the data, using symbolic algebra. It is then possible to determine how many and which parameters (or combination of these) can be estimated (Catchpole et al., 1998; Catchpole et al., 2001). Applications of this approach can be found in Gimenez et al. (2003), Bailey et al. (2004), Schaub et al. (2004), Nasution et al. (2004), Kery et al. (2005). Methods for dealing with parameter redundancy are reviewed by Gimenez et al. (2005). With the development of Markov chain Monte Carlo (MCMC) methods, the number of Bayesian analyses in biology has increased in recent years (Ellison, 2004; Clark, 2005), including population ecology (Brooks et al., 2000). Calculating maximum likelihood estimates is equivalent to finding the mode of the joint posterior distribution of the parameters, given flat priors. Since Bayesians usually examine posterior means/medians, rather than modes, issues of parameter redundancy might not be apparent, and might mistakenly be thought to be of no importance. Moreover, it is frequently admitted that a non-identifiable model is still suitable for inference through its identifiable parameters and information provided a priori (Lindley, 1971). In some cases, the introduction of a suitable informative prior can help in correcting the problem of identifiability (Gelfand and Sahu, 1999; Eberly and Carlin, 3

2000; Gustafson, 2005; Swartz et al., 2004). However, weak identifiability, which arises when data supply little information about some parameters, raises problems for a Bayesian approach. First, if any conclusion is based on the examination of weakly identifiable parameters, the interpretation can lead to misleading conclusions (Garrett and Zeger, 2000). Secondly, weak identifiability can result in strong correlations between parameters in the posterior distribution which, in turn, can lead to poor mixing in the MCMC samples and very slow convergence (Carlin and Louis, 1996; Rannala, 2002). Thirdly, even with large sample sizes, the likelihood may be unable to overcome the prior in the presence of weakly identified parameters (Neath and Samaniego, 1997). Finally, a too informative prior can pilot the inference based on the posterior, while a prior too close to improper can yield improper posteriors (Gelfand and Sahu, 1999; Bayarri and Berger, 2004). In the context of capture-mark-recovery data in particular, Brooks et al. (2000) pointed out that for parameterredundant models, when the likelihood does not contain information with regard to certain parameters, the information injected a priori can drive the results a posteriori. Additionally, even when uniform priors are used for all of the model-parameters, in a particular parameter-redundant model they were able to use the known location of the likelihood ridge to explain how marginal posterior inference can result in precise posterior distributions. We note finally that Gustafson (2005) establishes a number of links between different forms of identifiability. The focus in this paper is on comparing marginal posterior (denoted π(θ|Y ), for data Y and parameter θ) and prior distributions, denoted p(θ). A parameter θ2 is said to be marginally 4

non-identifiable if π(θ2 |Y ) = p(θ2 ), where p(θ2 ) is the prior distribution for θ2 . If prior distributions are independent, then marginal non-identifiability is implied by conditional nonidentifiability, which is when we have π(θ2 |θ1 , Y ) = p(θ2 |θ1 ). With the increase in computing power, the temptation is to fit more and more complex models using MCMC methods in a Bayesian framework. This carries the risk of over-parameterization which is the main motivation of our paper. Our aim is to show how the approach of Garrett and Zeger (2000) for dealing with weak identifiability may be usefully applied to capture-markrecapture and capture-mark-recovery models. This work is timely, with the increasing use of Bayesian methods in ecology. Although we focus on specific ecological models, our work is easily applicable to any other Bayesian ecological data analysis. The paper is organized as follows. The next section recalls formal definitions, including weak identifiability. In order to check for weak identifiability, we shall make use of the method proposed by Garrett and Zeger (2000) which consists of computing the percentage of overlap between the prior and posterior distributions. As in Garrett and Zeger (2000), we also display the prior/posterior pair plots as a visual aid. In the third and fourth sections, we illustrate the use of the Garrett and Zeger method by focusing on several data sets involving marked animals. These data and models fitted have been chosen because the identifiability status is known for all of them from a clas5

sical perspective. The last section gives general conclusions and discussion regarding the potential and limitations of the proposed method. 2. Testing for Bayesian identifiability 2.1 Theory In order to check the identifiability of the parameter θ, Garrett and Zeger (2000) compared its prior distribution to its posterior distribution by directly evaluating the overlap between the two distributions. This quantity has been used in several applications (Garrett et al., 2002; Huang and Bandeen-Roche, 2004, Ten Have et al., 2004, Elliott et al., 2005). The overlap index, denoted τθ , can be computed as Z τθ =

min (p(θ), π(θ|Y ))dθ.

(1)

Note that for simplicity, we use percentages throughout. Values of τθ lie between 0 and 100%. When τθ is close to 100%, θ is declared non identifiable. The ad-hoc threshold of 35% has been suggested, however, as suggested by the authors, this value deserves further investigation, in particular through simulation studies. We will return to this point in the Discussion section. 2.2 Bayesian inference We shall make use of Bayesian theory in conjunction with MCMC methods to estimate the parameters. Because our models involve solely probabilities, we used uniform and beta distributions on the interval [0, 1] as priors. Based on preliminary runs, we generated four chains of length 50000, discarding the first 25000 as burn-in. Convergence was assessed using the Brooks/Gelman/Rubin statistic also called the potential scale reduction factor, which compares the within- to the between-chain variability of chains 6

started at different and dispersed initial values (Gelman, 1996). We found that the Markov chains exhibit good mixing and moderate autocorrelation. The simulations were performed using WinBUGS (Spiegelhalter et al., 2003). The R (Ihaka and Gentleman, 1996) package R2WinBUGS (Sturtz et al., 2005) was used to call WinBUGS and export results in R. This was especially helpful when computing the overlap index and plotting the priorposterior pairs. The R and WinBUGS programs are available from the Biometrics website. 2.3 Practical computation of τ Surprisingly, Garrett and Zeger (2000) gave no indication of how to compute the overlap index. Here, we show that the computation of τ is relatively straightforward. Estimating the coefficient of overlap between two probability distributions is often interpreted as a measure of agreement between them (Bradley et al., 1983), and has had many applications in social sciences (Schmid and Schmidt, 2006, medicine (Reiser and Faraggi, 1999; Stine and Heyse, 2001) and ecology (Kuno, 1992). As a consequence, its computation has been the object of intense research. Recently, Schmid and Schmidt (2006) proposed nonparametric estimators of the coefficient of overlap showing good convergence properties, and based on a combination of kernel density techniques to approximate the probability distributions and quadrature formulas to evaluate the integrals. Following Schmid and Schmidt (2006), we made use of Equation (1) through two steps. First, we replaced the posterior π(θ|Y ) densities by kernel density estimators. Second, the integral was evaluated as a by-product of the MCMC simulations, by using a classical Monte-Carlo method. 7

We replaced the posterior distribution π(θ|Y ) by a kernel density estimator fˆ based on the MCMC generated values xi , i = 1, . . . , n (we took n = µ ¶ n X 1 1 x − x i K 1000), namely fˆ(x) = where K is a kernel function cenn i=1 h h tered at the data points xi and h is the bandwidth. We used a standard gausµ 2¶ 1 x sian kernel K(x) = √ exp − with its associated optimal bandwidth 2 2π 1 h∗ = 1.06 σ ˆ n− 5 , where σ ˆ = min (standard deviation, interquartile range/1.34) (Silverman, 1986, page 48). Note that if the posterior conditional distributions were of standard form, Rao-Blackwellisation would provide unbiased estimator of the marginal densities of the model parameters (Gelfand and Smith, 1990; Casella and Robert, 1996). We then get a sample from the distribution of the min function in Equation 1 by calculating yi = min (1, fˆ(xi )) for all i. Finally, a Monte Carlo X approximation to τ is given by τˆ = yi /n. i

Formann (2003) recommended that the sensitivity of Garrett and Zeger’s identifiability criterion should be investigated. Therefore following Brooks et al. (2000) we adopted Beta(4, 2) and Beta(20, 5) prior distributions in order to assess the sensitivity of our results. 3.

Mark-recapture: Application to the Cormack-Jolly-Seber model

We consider the fully time-dependent capture-mark-recapture CJS model originally developed by Cormack (1964), Jolly (1965) and Seber (1965), for which all parameters are time dependent. We define φi as the probability that an animal survives to time ti+1 given that it is alive at time ti , and pj the probability of being recaptured at time tj . Even though it is parameterredundant, the model can be useful for analyzing capture-mark-recapture data. We fitted the CJS model to the Dipper data set which consists of 7 8

capture occasions (Table 1). The data are the result of a recapture study of both males and females ringed as adults in eastern France, in 1981-1986. In this application, the last survival (φ6 ) and detection (p7 ) probabilities are known to be confounded while all the other parameters are estimable. [Table 1 about here.] Displays of the prior-posterior distribution pairs for survival and detection probabilities are given in Figure 1. The overlap percentages are given in Table 2 and correspond to the shaded areas in Figure 1. [Figure 1 about here.] [Table 2 about here.] From examining Figure 1, it appears that almost all CJS model parameters are well identified given the small overlap between the prior and posterior distributions. However, as expected, the last survival and detection probabilities are the exception, their posterior distributions being relatively flat and therefore providing more coverage of the uniform priors. In addition, it should be noted that the first survival and the first detection probabilities are clearly weakly identifiable, due to the fact that very few individuals were marked at the first sampling occasion (approximately 7% of the full data set). For the parameters described, the visual diagnostic of Figure 1 is confirmed by looking at the numerical values of τ in Table 2, which are all greater than 35% when uniform prior distributions are used for all parameters. Nevertheless, the numerical approach reveals further subtleties. In particular,

9

parameter p3 seems to be marginally weakly identifiable (τp3 = 35.9). A sensitivity analysis using two other different prior distributions for the survival probabilities suggests that this parameter may indeed be not well identified (Table 2). Finally, using the threshold 35% whatever the prior distributions yields sensitive results. For example, while the Beta(20, 5) and U [0, 1] priors for survival parameters give a similar answer, using the Beta(4, 2) prior distribution would lead us to conclude that parameters φ4 and φ5 are weakly identifiable. The explanation is as follows: a major flood occurred during the breeding season in 1983, which may have affected the survival rates between [1982, 1983] and [1983, 1984]. Consequently, the survival probabilities φ2 and φ3 were abnormally low for a small adult passerine, around 0.47, whereas φ4 and φ5 were estimated around 0.8 which is more realistic. The Beta(4, 2) having mean 0.67 with standard deviation 0.20, largely overlaps the posterior distribution we obtained for parameters φ4 and φ5 , thus explaining the misleading diagnostic. We will return to this point in the Discussion section. 4.

Mark-recoveries: Application to the Freeman-Morgan model

We now focus on probability models for recovery data of dead marked animals. Specifically, we consider a model developed by Freeman and Morgan (1992) (FM model hereafter) involving the survival probability φ1,i of birds in their first year of life, possibly varying with the year i, the probability of survival φa of adult individuals (i.e. with age ≥ 1), constant over time, and constant probabilities of reporting of rings from dead birds in their first year λ1 or older λa . Taking the FM model as an illustration, Catchpole et al. (2001) showed that non parameter-redundant probability models may behave poorly when 10

fitted to data. The main reason for what was referred to as near-singularity is that the FM model contains a parameter-redundant model as a particular case, the model with constant first year survival. As a consequence, the smallest eigenvalue of the information matrix is very small rather than zero as is the case in parameter-redundant models. We studied two data sets by means of the FM model, previously analysed by Catchpole et al. (2001). When this model was applied to Mallard data (Section 4.1) using only animals marked as young individuals, very poor results were obtained, with unrealistic estimates of ϕ1,i and λ1 and large standard errors (Catchpole et al., 2001). In contrast, when the same FM model was fitted to Goose data (Section 4.2), the results matched what was expected (Catchpole et al., 2001). The comparisons were done by reference to a data set comprising individuals marked as young and adults, making the sample size larger and therefore stabilizing the numerical results. 4.1 Mallard data We fitted the FM model to the Mallard data which consists of 9 years of recovery (Table 3). The data are the result of a ringing study of males ringed as young in the San Luis Valley, Colorado, 1963-1971. [Table 3 about here.] Displays of the prior-posterior distribution pairs for the parameters are given in Figure 2. The percentages of overlap are given in Table 4 and correspond to the shaded areas in Figure 2. [Figure 2 about here.] [Table 4 about here.] 11

The prior-posterior distribution pairs in Figure 2 clearly suggest that all parameters exhibit weak identifiability problems, except two, φa and λa which have a sharp a posteriori distribution. This is in agreement with previous results obtained by Catchpole et al. (2001) (Table 5a). The examination of Table 4 leads to the same conclusion with all τ values greater than or around 35%, except for the young reporting rate λ1 which was found to be identifiable using the 35% threshold. However, the sensitivity analysis using two other different prior distributions for the survival probabilities suggests that this parameter might be weakly identifiable (Table 4). This is in line with the findings of Catchpole et al. (2001). 4.2 Goose data We fitted the FM model to the Goose data which consists of 12 years of recovery (Table 5). The data are the result of a ringing study of individuals ringed as young at Swan Lake Refuge, Missouri, 1949-1957. [Table 5 about here.] Displays of the prior-posterior distribution pairs for the parameters are given in Figure 3. The percentages of overlap are given in Table 6 and correspond to the shaded areas in Figure 3. [Figure 3 about here.] [Table 6 about here.] Figure 3 shows that the FM model is well identified by the Goose data, corroborating results in Table 6 with all values of τ less than 30%. It should be noted that in this example, both the classical and Bayesian analyses gave 12

very similar point estimates, similar to the results obtained by fitting the same model to a larger data set, in which the recoveries of birds ringed as young are augmented by recoveries from birds ringed as adults (results not shown here). When the other priors were used, all parameters tend to be weakly identifiable with values of τ greater than 35%, apart from φa , λ1 and λa as in the preceding example and also φ1,7 . The reason why φ1,7 stands out as different from the other survival probabilities when a non-uniform prior is used is because it is estimated as the smallest survival probability when uniform priors are used, as can be appreciated from Figure 3. 5.

Discussion

The results of this paper are only a subset of those that we have obtained for a range of different models and data sets in the general area of mark-recapturerecovery. We have obtained a consistent result that as suggested by Garrett and Zeger (2000), the prior/posterior overlap threshold of τ = 35% works well for diagnosing weak identifiability in models for the annual survival of wild animals. However, as we saw in the last section, it is important that the approach is confined to the case of a uniform a priori distribution. If other priors are used then the values of τ obtained can be influenced by the values of the parameter estimates that result from a classical analysis (or when a uniform prior is used in a Bayesian analysis). It may then become difficult or impossible to calibrate τ . The conclusions of this paper have been checked using simulated data. For example, we simulated data corresponding to the Dipper data using the maximum-likelihood estimates of the model parameters. We generated 100 data sets with 500 newly marked individuals per occasion. In this case the 13

averaged values of τ are shown in Table 7. [Table 7 about here.] We can see again that the value of τ = 35% works well for the case of a uniform prior, and also here for a Beta(4, 2) prior, but fails for the Beta(20, 5) prior. We can see also that when there is a uniform and relatively large cohort size then the overlap statistic identifies only the parameters that are structurally redundant. The difference between using a classical approach to parameter redundancy and estimating the prior/posterior overlap as in this paper is that the latter approach takes account of the effect of the data on the model, as we saw in the dipper example. In practice it is important to understand both the redundancy structure of individual models as well as the influence of data. This has been seen in the analyses of Section 4. The estimates of reporting probabilities for the Mallard data are quite different, but in contrast the estimates of first-year survival are not. The opposite is true for the Goose data. What this means is that the fitted model for the Mallard data is closer to the parameter-redundant sub-model with parameters (λ1 , λa , φ1 , φa ) than the estimated model for the Goose data, a feature which is well captured by the use of τ and the 35% overlap threshold parameter for the case of uniform priors; for both of these data sets the same full-rank model is being used. We believe that the use of a 35% overlap threshold for τ , combined with uniform priors, is an important tool for interpreting the results of Bayesian analyses of mark-recapture-recovery data, and we recommend its general use in the area. 14

Acknowledgements The authors would like to thank R. Pradel for very stimulating and helpful discussions. O. Gimenez’s research was supported by a Marie-Curie IntraEuropean Fellowship within the Sixth European Community Framework Programme. ´sume ´ Re Nous passons en revue les id´ees importantes d’identifiabilit´e et de redondance en param`etres d’un mod`ele, dans les contextes fr´equentiste et bay´esien. Le crit`ere 35% de chevauchement entre les distributions a priori et a posteriori, sugg´er´e comme un indicateur d’identifiabilit´e faible, est ´evalu´e pour une gamme de donn´ees de marquage-recapture et marquage-reprise, pour lesquelles le diagnostic de redondance en param`etres est connu. Le pourcentage de chevauchement entre les distributions a priori et a posteriori est obtenu facilement `a partir des simulations MCMC. Nous montrons que si des distributions a priori uniformes sont utilis´ees pour tous les param`etres du mod`ele alors le crit`ere sugg´er´e marche bien. Nous recommendons son utilisation dans le domaine de l’analyse des donn´ees d’animaux marqu´es qui voit une augmentation croissante du nombre d’applications des m´ethodes bay´esiennes.

References Bailey, L. L., Kendall, W. L., Church, D. R. and Wilbur, H. M. (2004). Estimating survival and breeding probability for pond-breeding amphibians: 15

A modified robust design. Ecology 85, 2456–2466. Bayarri, M. J. and Berger, J. O. (2004). The interplay of bayesian and frequentist analysis. Statistical Science 19, 58–80. Bradley, E. L., Piantadosi, S. and Inman, H. F. (1983). Overlapping coefficient - a measure of agreement between distributions. Biometrics 39, 801–801. Brooks, S. P., Catchpole, E. A. and Morgan, B. J. T. (2000). Bayesian animal survival estimation. Statistical Science 15, 357–376. Brooks, S. P., Catchpole, E. A., Morgan, B. J. T. and Barry, S. C. (2000). On the bayesian analysis of ring-recovery data. Biometrics 56, 951–956. Brownie, C., Anderson, D., Burnham, K. P. and Robson, D. S. (1985). Statistical inference from band recovery data a handbook. Second Edition. U. S. Fish and Wildlife Service, Resource Publication 156. Washington, D. C., USA. 305pp. Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC. Casella, G. and Robert, C. P. (1996). Rao-blackwellisation of sampling schemes. Biometrika 83, 81–94. Catchpole, E. A., Freeman, S. N., Morgan, B. J. T. and Harris, M. P. (1998). Integrated recovery/recapture data analysis. Biometrics 54, 33–46. Catchpole, E. A., Kgosi, P. M. and Morgan, B. J. T. (2001). On the nearsingularity of models for animal recovery data. Biometrics 57, 720–726. Catchpole, E. A. and Morgan, B. J. T. (1997). Detecting parameter redundancy. Biometrika 84, 187–196. Catchpole, E. A., Morgan, B. J. T. and Freeman, S. N. (1998). Estimation 16

in parameter-redundant models. Biometrika 85, 462–468. Choquet, R., Reboulet, A. M., Pradel, R., Gimenez, O. and Lebreton, J. D. (2005). M-SURGE: new software specically designed for multistate capture-recapture models. Animal Biodiversity and Conservation 27, 207–215. Clark, J. S. (2005). Why environmental scientists are becoming bayesians. Ecology Letters 8, 2–14. Cormack, R. M. (1964). Estimates of survival from sighting of marked animals. Biometrika 51, 429–438. Eberly, L. E. and Carlin, B. P. (2000). Identifiability and convergence issues for markov chain monte carlo fitting of spatial models. Statistics in Medicine 19, 2279–2294. Elliott, M. R., Gallo, J. J., Ten Have, T. R., Bogner, H. R. and Katz, I. R. (2005). Using a bayesian latent growth curve model to identify trajectories of positive affect and negative events following myocardial infarction. Biostatistics 6, 119–143. Ellison, A. M. (2004). Bayesian inference in ecology. Ecology Letters 7, 509–520. Formann, A. K. (2003). Latent class model diagnosis from a frequentist point of view. Biometrics 59, 189–196. Freeman, S. N. and Morgan, B. J. T. (1992). A modeling strategy for recovery data from birds ringed as nestlings. Biometrics 48, 217–235. Garrett, E. S., Eaton, W. W. and Zeger, S. (2002). Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach. Statistics in Medicine 21, 1289–1307. 17

Garrett, E. S. and Zeger, S. L. (2000). Latent class model diagnosis. Biometrics 56, 1055–1067. Gelfand, A. E. and Sahu, K. (1999). Identifiability, improper priors, and gibbs sampling for generalized linear models. Journal of the American Statistical Association 94, 247–253. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398–409. Gelman, A. (1996). Inference and monitoring convergence. In Gilks, W. R., Richardson, S. and Spiegelhalter, D. J., editors, Markov chain Monte Carlo in practice., pages 131–143. Chapman and Hall. Gimenez, O., Choquet, R. and Lebreton, J. D. (2003). Parameter redundancy in multistate capture-recapture models. Biometrical Journal 45, 704–722. Gimenez, O., Viallefont, A., Choquet, R., Catchpole, E. A. and Morgan, B. J. T. (2005). Methods for investigating parameter redundancy. Animal Biodiversity and Conservation 27, 561–572. Gustafson, P. (2005). On model expansion, model contraction, identifiability and prior information: Two illustrative scenarios involving mismeasured variables. Statistical Science 20, 111–129. Huang, G. H. and Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika 69, 5–32. Ihaka, S. P. and Gentleman, E. A. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5, 299–314. Jolly, G. M. (1965). Explicit estimates from capture-recapture data with 18

both death and immigration-stochastic model. Biometrika 52, 225–247. Kery, M., Gregg, K. B. and Schaub, M. (2005). Demographic estimation methods for plants with unobservable life-states. Oikos 108, 307–320. Kuno, E. (1992). Competitive-exclusion through reproductive interference. Researches on population ecology 34, 275–284. Lebreton, J. D., Burnham, K. P., Clobert, J. and Anderson, D. R. (1992). Modeling survival and testing biological hypotheses using marked animals: a unified approach with case-studies. Ecological Monographs 62, 67–118. Lindley, D. V. (1971). Bayesian Statistics: A Review. Philadelphia, SIAM. Nasution, M. D., Brownie, C., Pollock, K. H. and Powell, R. A. (2004). The effect on model identifiability of allowing different relocation rates for live and dead animals in the combined analysis of telemetry and recapture data. Journal of Agricultural Biological and Environmental Statistics 9, 27–41. Neath, A. A. and Samaniego, F. J. (1997). On the efficacy of bayesian inference for nonidentifiable models. American Statistician 51, 225–232. Rannala, B. (2002). Identifiability of parameters in MCMC bayesian inference of phylogeny. Systematic Biology 51, 754–760. Reiser, B. and Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society Series D - The Statistician 48, 413–418. Schaub, M., Gimenez, O., Schmidt, B. R. and Pradel, R. (2004). Estimating survival and temporary emigration in the multistate capture-recapture framework. Ecology 85, 2107–2113. Schmid, F. and Schmidt, A. (2006). Nonparametric estimation of the co19

efficient of overlapping - theory and empirical application. Computation Statistics and Data Analysis 50, 1583–1596. Seber, G. A. F. (1965). A note on multiple-recapture census. Biometrika 52, 249–259. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. Spiegelhalter, D., Thomas, A., Best, N. and Lunn, D. (2003). WinBUGS User Manual. Version 1.4 (http://www.mrc-bsu.cam.ac.uk/bugs.). Technical report, Medical Research Council Biostatistics Unit. Cambridge. Stine, R. A. and Heyse, J. F. (2001). Non-parametric estimates of overlap. Statistics in Medicine 20, 215–236. Sturtz, S., Ligges, U. and Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software 12, 1–16. Swartz, T. B., Haitovsky, Y., Vexler, A. and Yang, T. (2004). Bayesian identifiability and misclassification in multinomial data. The Canadian Journal of Statistics 32, 285–302. Ten Have, T. R., Elliott, M. R., Joffe, M., Zanutto, E. and Datto, C. (2004). Causal models for randomized physician encouragement trials in treating primary care depression. Journal of the American Statistical Association 99, 16–25. Viallefont, A., Lebreton, J. D., Reboulet, A. M. and Gory, G. (1998). Parameter identifiability and model selection in capture-recapture models: a numerical approach. Biometrical Journal 40, 313–325. White, G. C. and Burnham, K. P. (1999). Program MARK: survival estimation from populations of marked animals. Bird Study 46, 120–139. 20

0.6

0.8

1.0

0.2

0.4

0.6

0.8

0.2

0.4

0.8

1.0

0.0

0.2

0.4

0.6

0.8

0.8

1.0

6 3

0.2

0.4

0.0

Density

p5

0.2

0.4

0.6 p6

0.2

0.4

0.8

1.0

0.0

0.2

0.4 p7

Figure 1. Display of the prior-posterior distribution pairs for φ and p in the CJS model applied to the Dipper data. In order to compute τ , the shaded area has to be calculated (Equation 1). The priors for all parameters were chosen as U (0, 1) distributions.

21

0.6

0.8

1.0

0.6

0.8

1.0

0.6

0.8

1.0

p4

8 0.0

1.0

1.5 0.0

1.0

4

Density 0.6

0.8

φ6

0

8

Density

4

0.4

1.0

p3

0

0.2

0.8

Density

Density 0.6 p2

0.0

0.6

0 2 4

Density

2.0

0.4

0.6

0.0

Density

6 0.0

φ5

0.0

0.2

0.4 φ3

3

1.0

φ4

0.0

0.2

0 3 6

0.4

0.0

2.0

0.2

1.0

0

Density 0.0

0.8

φ2

0 3 6

Density

φ1

0.6

0

Density 0.0

0.0

0.4

3

Density 0.2

0

2.0 0.0

Density

0.0

0.8

1.0

0.0

0.2

0.8

1.0

0.0

0.2

0.4

0.6

0.8

0.6

0.4

0.6

0.8

0.8

1.0

0.4

0.6 λ1

3 0

1.0

0.8

1.0

0.8

1.0

6

0.2

0.4

0.6

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6 λa

Figure 2. Display of the prior-posterior distribution pairs for φ1,i , φa , λ1 and λa in the FM model applied to the Mallard data. In order to compute τ , the shaded area has to be calculated (Equation 1). The priors for all parameters were chosen as U (0, 1) distributions.

22

0.8

3 0.0

Density

6

φa

0.2

1.0

φ1,9

3 0.0

0.8

0

1.0

0

Density

0.0

Density 0.2

0.6

φ1,6

4 0.0

Density 0.4

1.0

φ1,8

0 10

0.2

0.8

2

1.0

φ1,7

0.0

0.6

0

Density 0.2

0.4

0.4

φ1,3

φ1,5

0 2 4

Density

φ1,4

0.0

0.2

2

0.6

0.0

0

0.4

1.0

Density

Density 0.2

0.8

0 2 4

4 2 0.0

0.6

φ1,2

0

Density

φ1,1

0.4

10

0.6

0

0.4

Density

Density 0.2

0 2 4

0 2 4

Density

0.0

0.8

0.0

0.2

0.8

0.2

0.8

1.0

0.2

0.6

0.8

0.8

1.0

φa

0.2

0.4

0.6 λ1

8 4

0.4

0.6

0.2

0.4

0.6

1.0

0.8

1.0

0.8

1.0

φ1,9

0.8

1.0

0.0

0.2

0.4

0.6 λa

Figure 3. Display of the prior-posterior distribution pairs for φ1,i , φa , λ1 and λa in the FM model applied to the Geese data. In order to compute τ , the shaded area has to be calculated (Equation 1). The priors for all parameters were chosen as U (0, 1) distributions.

23

0.8

6 0.0

Density 0.0

1.0

3

1.0

6 12

Density 0.6

0.4

0.8

φ1,6

0

Density

20

0.4

0.2

φ1,8

0

0.2

0.0

Density 0.0

φ1,7

0.0

1.0

4

Density 0.6

0.8

0

Density

4

0.4

0.6

φ1,5

0

0.2

0.4

0.6

0

Density 0.0

0.4

φ1,3

8

1.0

φ1,4

0.0

0.2

6

0.6

0.0

3

0.4

1.0

4

Density 0.2

0.8

0

8 4 0.0

0.6

φ1,2

0

Density

φ1,1

0.4

0

Density

8

1.0

0

0.6

0 10

0.4

4

Density 0.2

0

8 4 0

Density

0.0

Table 1 Recapture data for European dippers (Cinclus cinclus) (data taken from Lebreton et al. 1992). Year of release 1981 1982 1983 1984 1985 1986

Year of recapture (1981+) 1 2 3 4 5 6 11 2 0 0 0 0 24 1 0 0 0 34 2 0 0 45 1 2 51 0 52

24

Number never recaptured 9 35 42 32 37 46

Table 2 Values of τ for the CJS model applied to the Dipper data. As prior distributions, we use U (0, 1) for p throughout, while for the φ parameters, we use U (0, 1), Beta(4, 2) or Beta(20, 5). Parameter φ1 φ2 φ3 φ4 φ5 φ6

U (0, 1) 53.6 33.4 29.2 27.9 27.4 57.2

Beta(4, 2) 80.2 33.9 32.2 48.3 43.3 82.4

Beta(20, 5) 34.2 32.4 27.2 25.5 25.4 34.5

p2 p3 p4 p5 p6 p7

54.7 35.9 28.4 26.4 23.4 56.5

52.5 37.5 29.2 27.0 23.3 52.2

53.1 42.4 30.5 27.7 28.9 40.7

25

Table 3 Recovery data for Mallards (Anas platyrhynchos) (data taken from Brownie et al. 1985, p. 58). Year of ringing 1963 1964 1965 1966 1967 1968 1969 1970 1971

1 83

2 35 103

Year of recovery 3 4 5 18 16 6 21 13 11 82 36 26 153 39 109

(-1962) 6 7 8 5 8 6 24 15 22 21 38 31 113 64 124

26

8 3 6 18 16 15 29 45 95

9 1 0 4 8 1 22 22 25 38

Number never seen again 787 534 927 942 1005 927 940 786 315

Table 4 Values of τ for the FM model applied to the Mallard data. As prior distributions, we use U (0, 1) for the λ parameters throughout, while for the φ parameters, we use U (0, 1), Beta(4, 2) or Beta(20, 5). Parameter φ1,1 φ1,2 φ1,3 φ1,4 φ1,5 φ1,6 φ1,7 φ1,8 φ1,9

U (0, 1) 40.2 41.5 34.6 40.1 41.7 34.7 41.5 43.4 45.3

Beta(4, 2) 50.2 47.9 41.6 50.5 52.3 44.4 50.9 54.0 56.1

Beta(20, 5) 56.2 47.2 45.3 54.9 57.2 57.3 59.3 59.6 69.2

φa

10.7

19.3

14.5

λ1 λa

30.8 13.4

31.7 8.6

33.7 5.0

27

Table 5 Recovery data for Canada geese (Branta canadensis) (data taken from Brownie et al. 1985, p. 81). Year of ringing 1949 1950 1951 1952 1953 1954 1955 1956 1957

1 59

2 31 29

3 29 36 37

4 17 19 33 53

Year of recovery (-1948) 5 6 7 8 9 19 11 20 5 5 17 16 9 4 1 19 14 21 7 4 39 31 37 14 5 51 36 39 12 7 140 81 22 16 156 38 22 105 33 39

28

10 6 5 10 10 13 16 25 31 27

11 3 0 6 3 12 10 13 21 11

12 2 1 2 2 3 6 11 16 10

Number never seen again 455 459 420 482 428 869 729 644 313

Table 6 Values of τ for the FM model applied to the Geese data. As prior distributions, we use U (0, 1) for the λ parameters throughout, while for the φ parameters, we use U (0, 1), Beta(4, 2) or Beta(20, 5). Parameter φ1,1 φ1,2 φ1,3 φ1,4 φ1,5 φ1,6 φ1,7 φ1,8 φ1,9

U (0, 1) 22.5 26.9 24.7 24.2 23.8 28.4 26.9 27.8 29.6

Beta(4, 2) 37.8 39.0 40.3 39.5 39.5 34.6 27.3 36.5 44.8

Beta(20, 5) 55.3 47.1 53.5 59.5 59.2 36.4 17.5 37.8 61.6

φa

7.6

13.8

18.1

λ1 λa

18.4 11.9

17.3 10.2

25.5 6.8

29

Table 7 Values of τ for the CJS model applied to simulated data based on the dipper data parameter estimates. As prior distributions, we use U (0, 1) for p throughout, while for the φ parameters, we use U (0, 1), Beta(4, 2) or Beta(20, 5). Parameter φ1 φ2 φ3 φ4 φ5 φ6

U (0, 1) 18.7 11.8 10.7 10.5 9.9 52.0

Beta(4, 2) 29.8 13.1 12.6 15.4 14.1 73.5

Beta(20, 5) 10.3 0.3 0.3 1.9 1.4 30.1

p2 p3 p4 p5 p6 p7

18.7 13.4 11.4 10.3 9.0 51.9

18.7 13.5 11.4 10.2 8.9 48.5

18.4 13.8 11.5 10.3 9.1 33.9

30

Suggest Documents