Flexible modeling for stock-recruitment relationships using Bayesian nonparametric mixtures Kassandra Fronczyk, Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA 95064, U.S.A. (Emails:
[email protected],
[email protected]) and Stephan Munch Marine Science Research Center, Stony Brook University Stony Brook, NY 11794, U.S.A.
The stock and recruitment relationship is fundamental to the management of fishery natural resources. However, infering stock-recruitment relationships is a challenging problem because of the limited available data, the collection of plausible models, and the biological characteristics that should be reflected in the model. Motivated by limitations of traditional parametric stock-recruitment models, we propose a Bayesian nonparametric approach based on a mixture model for the joint distribution of logreproductive success and stock biomass. Flexible mixture modeling for this bivariate distribution yields rich inference for the stock-recruitment relationship through the implied conditional distribution of log-reproductive success given stock biomass. The method is illustrated with cod data from six regions of the North Atlantic, including comparison with simpler Bayesian parametric and semiparametric models. KEY WORDS: Dirichlet process; Markov chain Monte Carlo; Multivariate normal mixtures; North Atlantic cod; Stock biomass.
1
1. Introduction The stock-recruitment (S-R) relationship, i.e., the relationship between the stock size and the level of recruitment to that stock is a key aspect of fishery research with direct implications to the management of natural resources. With recent technological advances and geographical expansion, most fishery resources have been driven to low and unproductive levels. In fact, over the last decade the number of overexploited stocks has dramatically increased in most regions of the world (Garcia and de Leiva Moreno, 2003). Consequently, there has been a substantial rise in concern about the basic biological sustainability of fishing. The S-R relationship is used to make decisions on the limits of sustainable fishing and is therefore fundamental to the management of fishery resources. However, modeling and infering S-R relationships is a challenging problem. Data is limited and measured with noise in both stock biomass and estimated recruitment; the S-R relationship is likely to be nonlinear over some ranges of stock sizes; and there are various plausible biological mechanisms that are consistent with very different functional relationships between stock and recruitment. In particular, although many parametric S-R models can be biologically derived, the relationship is usually disguised by environmental variability and difficult to determine with accuracy. The Bayesian paradigm offers a natural modeling framework for S-R relationships. In general, Bayesian approaches enable incorporation of prior knowledge, have the capacity to model different sources of data uncertainty, and build inference from probabilistic modeling, which thus yields appropriate uncertainty quantification of point estimates that does not rely on asymptotic considerations. Indeed, parametric Bayesian techniques are increasingly utilized in fisheries and, more generally, in ecology; see, e.g., Punt and Hilborn (1997), McAllister and Kirkwood (1998), Clark (2005; 2007). 2
However, parametric approaches to S-R modeling, Bayesian and classical alike, may be too restrictive, since they rely on specific parametric forms for the S-R function. Depending on their assumptions, parametric models can produce very different inference and can be highly influenced by extreme observations. Semiparametric and nonparametric estimation methods for S-R relationships avoid strong assumptions implied by parametric approaches, and are thus becoming increasingly popular. Classical nonparametric methods include: construction of the distribution of recruitment given stock biomass through nonparametric density estimators (Evans and Rice, 1988); fitting a locally weighted smoothing function with nonparametric regression and spline methods (Cook, 1998); using neural networks to estimate the S-R function (Chen and Ware, 1999); and estimation of transition probabilities between specified regions in the S-R plane (Brodziak et al., 2002). A practical drawback of these methods involves uncertainty quantification for the estimates of the S-R function and of management reference points resulting from the estimated S-R relationship. This is a direct consequence of the fact that classical nonparametric estimation techniques do not involve probabilistic modeling of the underlying (conditional) distribution of recruitment given stock biomass or of the corresponding joint distribution. Hence, by avoiding potentially suspect parametric distributional forms, i.e., by avoiding likelihood specification, they are inevitably limited to point estimation. When developed, error bounds depend heavily on asymptotic results, which are unreliable because of the small sample sizes typically available for S-R inference. The impetus for a Bayesian nonparametric approach is the same with classical nonparametric methods, that is, to provide inference for the S-R relationship that avoids restrictive parametric assumptions. However, Bayesian nonparametric methodology utilizes a drastically different approach to inference. Instead of avoiding modeling the 3
stochastic mechanism that generates the data, it treats the corresponding distribution as the unknown (infinite-dimensional) parameter, which is assigned a nonparametric prior that supports the space of all plausible distributions. Hence, Bayesian nonparametric methods combine the advantages of Bayesian modeling with the appeal of nonparametric inference. In particular, they provide data-driven, albeit model-based, inference, and, importantly, more reliable predictions than parametric models. To our knowledge, the only application of Bayesian nonparametrics to modeling S-R relationships appears in Munch, Kottas and Mangel (2005) and Patil (2007). This work is based on semiparametric modeling, using the standard regression setting with a normal distribution for recruitment on the log scale, and with a Gaussian process (GP) prior for the S-R function. Although, as shown in Munch et al. (2005), the GP semiparametric model outperforms traditional parametric models, it still includes potentially restrictive modeling aspects as it builds inference from a parametric recruitment distribution, a stationary prior model for the S-R function, and an implicit assumption that stock biomass is observed with negligible measurement errors. Here, we propose a more general fully nonparametric Bayesian modeling approach for S-R relationships, which, while retaining a computationally tractable framework for inference, it obviates the need for the above assumptions. The approach is built from a mixture model for the joint distribution of log-reproductive success, log(R/S), and stock biomass, where R and S denote recruitment and stock size, respectively. (Note that regression approaches to modeling the S-R function typically use log(R/S) and S as the response and covariate, respectively.) Flexible mixture modeling for this bivariate distribution yields rich inference for the S-R relationship through the implied conditional distribution of log(R/S) given S. Key features of the model include its capacity to uncover both non-linear S-R relationships as well as non-standard shapes 4
for the conditional density of log-reproductive success. We illustrate the utility of the proposed nonparametric model by applying it to cod data from six regions of the North Atlantic, and comparing inference results with the semiparametric GP model discussed above, and the Ricker model, a commonly utilized parametric model. The outline of the paper is as follows. Section 2 provides background on the North Atlantic cod data. Section 3 introduces the Bayesian nonparametric mixture model, and Section 4 presents the results from the North Atlantic cod data, including the model comparison study. Finally, Section 5 concludes with a discussion.
2. Data Description Cod are demersal marine fish that, historically, matured around 7 years and live for several decades (see, e.g., Barot et al., 2004). Large females produce millions of eggs and the total number of eggs produced is roughly indexed by the total biomass of spawning individuals. Recruitment refers to the number of individuals in the first age class which are fully susceptible to fishing. Survival from the egg stage to the age at recruitment is density dependent leading to a non-linear relationship between stock biomass and recruitment. We consider data on recruitment and stock biomass for cod from six North Atlantic regions: NE Arctic, Icelandic, Irish Sea, Faroe, Skagerrak, and West of Scotland. Each region has data dating from as far back as 1946 through 2004. Recruitment and biomass estimates were determined by virtual population analysis (Hilborn and Walters, 1992) of commercial landings data and fishery independent research surveys (ICES, 2005). Virtual population analysis is essentially an accounting procedure used to determine the number of fish that must have been in the population given the numbers that have been removed over time and an independent estimate of natural mortality. These data have been analyzed previously by many authors (see, 5
e.g., Brander and Mohn, 2004; Stige et al., 2006, and further references therein). As discussed in the Introduction, in the S-R regression setting, the response of interest is typically log-reproductive success, y = log(R/S), and the covariate x = S. Hence, the data comprises {(yi , xi ) : i = 1, . . . , n}, where yi denotes the logreproductive success for year i, and xi denotes the stock biomass for year i. The data from the six regions range in length from 27 to 59 years and, as shown in Figure 1, indicate different functional forms for the S-R relationship.
3. Bayesian Nonparametric Modeling for Stock-Recruitment Relationships Section 3.1 presents the nonparametric mixture modeling approach for the S-R relationship. The methods for posterior inference are discussed in Section 3.2.
3.1 The Mixture Modeling Approach To develop a flexible inferential framework for S-R relationships, we propose a Bayesian nonparametric mixture model for the joint distribution of log-reproductive success, y = log(R/S), and stock biomass, x = S. In particular, we model this bivariate distribution with a Dirichlet process (DP) mixture of bivariate normals mixing on both the mean vector and covariance matrix parameters, Z f (y, x; G) =
N2 (y, x; µ, Σ)dG(µ, Σ), G ∼ DP(α, G0 ),
(1)
where N2 (µ, Σ) will denote the bivariate normal density or distribution (depending on the context) with mean vector µ and covariance matrix Σ. Moreover, DP(α, G0 ) denotes the DP prior (Ferguson, 1973) for the random mixing distribution G, defined in terms of a parametric centering (base) distribution G0 , and parameter α > 0, which controls the variability of G about G0 with larger values of α resulting in realizations 6
G that are closer to G0 . We take G0 (µ, Σ; m, V, Q) = N2 (µ; m, V )IW(Σ; v, Q). Here, IW(v, Q) denotes the inverse Wishart distribution for the 2×2 (positive definite) matrix Σ with density proportional to |Σ|−(v+3)/2 exp{−0.5tr(QΣ−1 )}. Using its constructive definition (Sethuraman, 1994), the DP prior generates countable mixtures of point masses with locations drawn from the base distribution and weights defined by a stick-breaking process. Specifically, a G drawn from DP(α, G0 ) has an almost sure representation as G(·) =
∞ X
ωl δθl (·),
(2)
l=1
where δa denotes a point mass at a, the θ l = (µl , Σl ) are i.i.d. from G0 , and ω1 = ζ1 , Q ωl = ζl l−1 m=1 (1 − ζm ) for l ≥ 2, with ζl i.i.d. from Beta(1,α) (independently of the θ l ). The mixture model in (1) is nonparametric, since instead of specifying the mixing distribution G through a particular parametric family, a nonparametric prior is placed on G, which allows the data to determine the form of G and thus of the resulting joint mixture density for log-reproductive success and stock biomass. In this regard, the discreteness of G, implied by its DP prior, is a key feature as, given the data (yi , xi ), it enables flexible shapes for the mixture density f (y, x; G) through data-driven clustering of the mixing parameters associated with the (yi , xi ). We next discuss the full probability model for the data = {(yi , xi ) : i = 1, . . . , n}. To develop a hierarchical model that facilitates Markov chain Monte Carlo (MCMC) posterior simulation, we follow a standard approach based on a truncation approximation of G (e.g., Ishwaran and James, 2001). In particular, using the DP stickbreaking definition in (2), we approximate the countable discrete mixing distribution PN G in (1) with a finite dimensional version defined as GN (·) = l=1 pl δθ l (·). Here, the θ l = (µl , Σl ), l = 1, ..., N , are i.i.d. from G0 , and the random weights, p = 7
(p1 , ..., pN ), arise from the stick-breaking process. Specifically, with V1 , . . . , VN −1 i.i.d. Q from Beta(1, α), we have p1 = V1 , and for l = 2, ..., N − 1, pl = Vl l−1 m=1 (1 − Vm ), P −1 QN −1 resulting in pN = 1 − N p = l l=1 m=1 (1 − Vm ). The induced joint density, f (p; α), for the random weights corresponds to a generalized Dirichlet distribution given in Appendix A. Regarding the choice of N , an appropriate truncation value can be found by considering the behavior of the higher order weights ωl in (2). For instance, for UN = P∞ N −1 . Given a suitable tolerance l=N ωl it can be shown that E(UN ) = {α/(α + 1)} level and a prior estimate for α, this expression provides the corresponding truncation value N . For the analysis of the North Atlantic cod data in Section 4, the truncation value was set at N = 25 (larger values of N did not significantly change posterior inference results). The standard hierarchical model formulation for the data involves mixing pa˜ i ), i = 1, ..., n, such that the (yi , xi ), given (µ˜i , Σ ˜ i ), are independent rameters (µ˜i , Σ ˜ i ), with the (µ˜i , Σ ˜ i ), given G, arising i.i.d. from G. Under the truncation N2 (yi , xi ; µ˜i , Σ approximation, GN , to G, latent configuration variables L = (L1 , . . . , Ln ) are introduced to break the finite mixture approximation to the countable DP mixture model. ˜ i) = In particular, each Li takes a value in {1, . . . , N } with Li = l signifying that (µ˜i , Σ (µl , Σl ), for i = 1, . . . , n and l = 1, . . . , N . Hence, including the configuration variables and replacing G with GN ≡ (p, θ), where θ = (θ 1 , ..., θ N ), the model becomes (yi , xi ) | θ, Li
ind
∼
Li | p
i.i.d.
∼
N2 (yi , xi ; µLi , ΣLi ), i = 1, . . . , n N X pl δl (·), i = 1, . . . , n
p|α
∼
f (p; α)
l=1
θ l | m, V, Q
i.i.d.
∼
G0 (θ l ; m, V, Q), l = 1, . . . , N.
(3)
The model is completed with priors for α and for the hyperparameters (m, V, Q) 8
of G0 . We place a gamma(aα , bα ) prior on α, and take N2 (am , Bm ), IW(aV , BV ), and W(aQ , BQ ) priors for m, V , and Q, respectively, where W(aQ , BQ ) is a Wishart distribution for the 2 × 2 (positive definite) matrix Q with density proportional to −1 |Q|(aQ −3)/2 exp{−0.5tr(QBQ )} (with aQ ≥ 2). Specification of the hyperprior param-
eters is discussed in Appendix A. We note that multivariate normal DP mixture priors, of the form in (1), have been applied in various settings following the work of M¨ uller, Erkanli and West (1996) on multivariate density estimation and curve fitting. However, the scope of inference has been typically limited to posterior point estimates, obtained through posterior predictive densities, p(y, x | data) ≡ E(f (y, x; G) | data). This is especially restrictive for regression applications where posterior predictive densities can only provide approximations to, say, E(f (y | x; G) | data) and E(y | x; G), which would be natural point estimates for the conditional response density and the regression function, respectively. Proper point estimates, and more importantly, associated uncertainty quantification require the posterior of the random mixing distribution G, which is readily available (through its truncation approximation) under model (3). Details in the context of our application are given in Section 3.2. Modeling and inference for general regression problems, using DP mixtures, is studied by Taddy (2008); an application in the context of quantile regression is considered in Taddy and Kottas (2009).
3.2 Posterior Inference We use blocked Gibbs sampling (e.g., Ishwaran and James, 2001) to obtain the posterior distribution p(p, θ, L, α, m, V, Q | data) corresponding to model (3). The blocked Gibbs sampler updates parameters using draws from standard distributions, the details of which are given in Appendix A. 9
Each posterior sample for (p, θ) provides a posterior realization for GN (·) directly P through its definition, N l=1 pl δ(µl ,Σl ) (·). Next, for any specified point (y0 , x0 ) in the (log-reproductive success, stock biomass) space, f (y0 , x0 ; GN ) =
N X
pl N2 (y0 , x0 ; µl , Σl )
l=1
is a realization from the posterior of the joint mixture density for log-reproductive sucP x xx cess and stock biomass at point (y0 , x0 ). Moreover, f (x0 ; GN ) = N l=1 pl N(x0 ; µl , Σl ) is a posterior realization for the marginal stock biomass density at x0 . Here, µxl and Σxx l are the mean and variance of the marginal normal distribution for x induced by the joint N2 (y, x; µl , Σl ) distribution. Hence, inference for the entire conditional density of log-reproductive success at any fixed value, x0 , of stock biomass is available through f (y | x0 ; GN ) = f (y, x0 ; GN )/f (x0 ; GN ). Whereas traditional S-R models only allow for unimodal log-reproductive success densities, the proposed DP mixture model can capture general unimodal and multimodal shapes. As illustrated with the North Atlantic cod data in Section 4, the model yields full inference for conditional response densities with shapes that can change with different stock biomass values. Inference for the S-R relationship can be obtained through any central feature of the conditional log-reproductive success distribution. For instance, the mean regression R function, E(y | x; GN ) = {f (x; GN )}−1 yf (y, x; GN )dy, can be expressed as N
X 1 y yx xx−1 E(y | x; GN ) = pl N(x; µxl , Σxx (x − µxl )}, l ){µl + Σl Σl f (x; GN ) l=1
(4)
where µyl is the marginal mean for y, and Σyx l is the covariance between y and x arising from the joint N2 (y, x; µl , Σl ) distribution. A key inferential objective when modeling S-R relationships is estimation of reference points that are used in fisheries management decisions. To illustrate such inference under the DP mixture modeling approach, we consider an important reference 10
point, the unfished biomass, B0 , which is traditionally calculated as the solution to R(S) = M S. Here, R(S) denotes the S-R relationship, and M represents the fraction of the fish population killed by natural causes. Hence, B0 corresponds to the stock biomass value that satisfies restriction log(R(S)/S) = log M , and thus under the joint DP mixture model for (log(R/S), S), the density for B0 is given by f (x | y = log M ; GN ). The posterior distribution for this conditional density can be obtained as above, working in this case with the joint mixture density and the marginal log-reproductive success density. As with the conditional response density, if the situation warrants, the model can uncover non-standard shapes in the B0 density.
4. Data Illustrations Section 4.1 presents results from the analysis of the North Atlantic cod data under the DP mixture modeling approach of Section 3. In Section 4.2, we consider comparison with simpler parametric and semiparametric S-R models.
4.1 Results for the North Atlantic Cod Data For each of the North Atlantic regions, we fit the DP mixture model in (3) to the corresponding cod data. Figure 1 plots point estimates (posterior means) and 95% interval estimates for the mean S-R regression function in (4). The model uncovers the diverse conditional mean log-reproductive success relationships for each of the six regions. The NE Arctic, Icelandic, and Faroe regions have roughly negative linear relationships. Each of them shows evidence of a slight cubic curve within a small portion of the range of stock biomass values, in particular, for smaller biomass values for the NE Arctic and Icelandic regions and for higher values of the Faroe region. The posterior uncertainty bands for the NE Arctic and Icelandic regions are tighter than those of the 11
Faroe region. The Irish Sea region has an unusual non-linear relationship with interval estimates that widen near the extremes of the data, while the Skagerrak region has an approximately quadratic shape, with the exception of the curve near the maximum stock biomass level, and uncertainty bands that are evenly spread throughout the stock biomass range. The last region, West of Scotland, has a relatively weak signal; in fact, it corresponds to the smallest sample size with only 27 data points. To illustrate inference for the conditional density of log-reproductive success, we consider the NE Arctic and West of Scotland regions, and in Figure 2, show posterior point and 95% interval estimates for the conditional response density at four fixed stock biomass values. At lower biomass levels, the log-reproductive success densities for the NE Arctic region depict obvious departures from normality; at S = 200, 000, the density is bimodal, whereas at S = 350, 000, it has a heavy left tail. Inspection of the data from this region suggests that this is a plausible feature for the log-reproductive success distribution rather than an artifact of the flexible prior model. The higher values of stock biomass result in more standard unimodal response densities. The logreproductive success densities from the West of Scotland region are unimodal across the range of stock biomass values, with roughly the same amount of dispersion. Evidently, such density shapes would be successfully uncovered by traditional parametric models. Hence, a key feature of the nonparametric DP mixture model is that it has the capacity to capture both non-standard log-reproductive success density shapes as well as nonlinear S-R relationships. And, as importantly, when the data do not support such non-standard features, it yields inferences that are comparable with the ones that would result from commonly utilized parametric S-R models. For each of the six regions, Figure 3 provides posterior point and 95% interval estimates of the density for unfished biomass, B0 , obtained as discussed in Section 12
3.2. In keeping with prior research on cod (e.g., Eero et al., 2007), we set the natural mortality rate M = 0.2. All the results are consistent with the data as becomes clear by comparing Figures 3 and 1 (noting where value log(0.2) lies in the panels of the latter figure). The DP mixture model yields multimodal B0 posterior density estimates for the NE Arctic, Icelandic, and Irish Sea regions. On the other hand, it results in more conventional estimates in the Faroe region. For both the Skagerrak region and the West of Scotland region, log(0.2) is outside the range of observed log-reproductive success values. This is reflected in the point estimates for the B0 density, which are very dispersed with wide associated uncertainty bands.
4.2 Comparison Study Here, we consider model comparison working with the standard parametric Ricker model (Ricker, 1954), and with a semiparametric model, which is simpler than the fully nonparametric DP mixture model. We report results based on the data from the NE Arctic and Irish Sea regions. We perform a Bayesian analysis of the Ricker model, Ri = aSi exp(−bSi )vi , where a > 0 and b > 0, and the vi are independent multiplicative errors, typically, assumed to arise from a lognormal distribution. Hence, the Ricker model can be transformed to a linear regression model of log(R/S) on S, i.e., i.i.d.
yi = β0 − β1 Si + εi , εi | σ 2 ∼ N(0, σ 2 ),
i = 1, ..., n
where yi = log(Ri /Si ), β0 = log(a), and β1 = b > 0. We assign a normal prior to β0 , a gamma prior to β1 , and an inverse gamma prior to σ 2 . MCMC sampling for (β0 , β1 , σ 2 ) is straightforward as discussed in Appendix B. Inference for the S-R relationship follows directly from the posterior draws for parameters β0 and β1 . 13
We also fit a Bayesian semiparametric model using a Gaussian process (GP) prior for the S-R function. Specifically, i.i.d.
yi = h(Si ) + εi , εi | σ 2 ∼ N(0, σ 2 ),
i = 1, ..., n
where the S-R function, h(·), is assigned a GP(µ(S), C(S, S 0 )) prior (see, e.g., Neal, 1998, on Bayesian GP regression). We take a constant mean function, µ(S) = µ, and an isotropic exponential covariance function given by C(S, S 0 ) = τ 2 exp{−φ|S − S 0 |}, where τ 2 is the GP variance, and φ > 0 controls how rapidly the correlation decreases with distance between biomass values. We place a normal prior on µ, inverse gamma priors on σ 2 and τ 2 , and a uniform prior on φ. The GP regression model adds flexibility relative to traditional S-R models by placing a nonparametric prior on the S-R function rather than assuming a specific parametric form. It was studied by Munch et. al. (2005) in the context of modeling S-R relationships, though the scale used there involved y = log R and x = log S. Appendix B provides details on prior specification and MCMC posterior simulation for the GP model parameters as well as on posterior predictive inference for function h(·). Figure 4 shows posterior mean and 95% interval estimates for the S-R function: β0 − β1 S under the parametric Ricker model (left column); h(S) under the semiparametric GP model (middle column); and E(log(R/S) | S; GN ) under the nonparametric DP mixture model (right column). To ensure a fair comparison, the priors were chosen such that a priori the conditional densities of log-reproductive success given stock biomass are comparable across the three models, and, in fact, corresponding to a fairly noninformative prior specification. For instance, for all three models and both regions, the prior estimate (prior mean) for conditional mean log-reproductive success was roughly constant (in S) around 0, and the associated prior interval estimates were including 14
values for log(R/S) in the range from −4 to 4. In the NE Arctic region, the Ricker model is roughly equivalent to both the GP and the DP mixture model. The GP regression curve generates slightly more variability than the DP model, but both include a small cubic trend at the smaller stock biomass values. However, the Irish Sea region gives noticeably different regression curves using the Ricker model and either of the GP or the DP mixture model. The Ricker model is unable to depart from linearity and is sensitive to the outlying observations. The GP model produces a more accurate regression curve than the Ricker model, with an approximately negative slope and modest curvature. The associated uncertainty bands have roughly the same width across the range of biomass values, regardless of the amount of data in the surrounding area. This is partly due to the fact that the Irish Sea region data are spread relatively evenly through the range of stock sizes, but it can also be attributed to the GP prior stationary covariance function. The DP mixture model does not rely on stationarity assumptions and therefore uncertainty bands become wide or narrow based on how much data information is available about the particular biomass area. Regarding inference for the log-reproductive success response distribution, the nonparametric model clearly outperforms the other two models. Both of the parametric and semiparametric models are built from the standard additive regression setting treating the stock biomass covariate as fixed, and assuming a normal response density with constant variance. Even if the error distribution is replaced with more flexible parametric families, neither of the two models would be able to recover the density shapes for the NE Arctic region (Figure 2) revealed by the DP mixture model. A further key advantage of the DP mixture model is with regard to inference for the unfished biomass management reference point. Although the Ricker and GP models 15
produce, in principle, a posterior distribution for B0 , the resulting inference is not as flexible as under the DP mixture model and, in fact, may not be attainable in practice. The Ricker model can be solved analytically for B0 , in particular, B0 = (β0 −log(M ))/β1 . However, this expression is not guaranteed to be positive, may result in values that are far outside the range of the data, and yields typically a unimodal distribution. For the GP model, B0 corresponds to the value of stock biomass at which h(S) = log(M ). Therefore, for each posterior realization of the h(S) curve, B0 can be obtained by finding values S0 and S1 such that h(S0 ) ≤ log(M ) ≤ h(S1 ) and interpolating between these grid points. The GP model is unable to provide results when the curve does not attain the value of log(M ), as in the case of the Skagerrak and West of Scotland regions. To formally compare the three models, we use a version of the minimum posterior predictive loss criterion from Gelfand and Ghosh (1998). The criterion favors the model, m, that minimizes the predictive loss measure, D(m) = P (m) + G(m) =
n X
(m)
Var
(ynew,i | data) +
n X
{yi − E(m) (ynew,i | data)}2 .
i=1
i=1
Here, E(m) (ynew,i | data) and Var(m) (ynew,i | data) are the mean and variance, respectively, under model m, of the posterior predictive distribution for replicated response ynew,i with associated stock biomass xi . Hence, G(m) is a goodness-of-fit term, and P (m) acts as a penalty term for model complexity quantified through inflated predictive variances. Computing D(m) for the Ricker and GP models is straightforward as discussed in Appendix B. Under the DP mixture model, estimation of D(m) is based on expectations with respect to the conditional posterior predictive distribution of log-reproductive success given stock biomass; details are given in Appendix A. Table 1 reports results for D(m) under the three models. The goodness-of-fit term 16
G(m) is comparable across the three models in the NE Arctic region, but the penalty terms in the Ricker and GP models are drastically larger than the DP mixture model. In the Irish Sea region, the Ricker model performs poorly in both the goodness-of-fit and the penalty term. The GP model produces a smaller goodness-of-fit term, but a substantially larger penalty term than the DP mixture model. Under the minimum posterior predictive loss criterion, the Ricker model is slightly favored over the GP model in the NE Arctic region, whereas the GP model fares better than the Ricker model with the more complicated data in the Irish Sea region. For both regions, the DP mixture model outperforms both the parametric Ricker and semiparametric GP models.
5. Discussion We have presented a Bayesian nonparametric method to flexibly model the stockrecruitment (S-R) relationship, which is a key aspect of fishery research. A distinguishing feature of the modeling approach is that it incorporates uncertainty in both the log-reproductive success and the stock biomass values. The corresponding joint distribution was modeled with a nonparametric Dirichlet process (DP) mixture. The implied conditional distribution of log-reproductive success given stock biomass values can be used for a wide range of practically important inferences. In particular, working with cod data from six regions of the North Atlantic, we illustrated the capacity of the model to uncover non-linear S-R relationships in the presence of small to moderate sample sizes. The DP mixture model also yields full (and exact) inference for conditional log-reproductive success densities with shapes that can change with different values in the biomass space. Moreover, by reversing the order of conditioning, i.e., working with the conditional distribution of stock biomass given values of log-reproductive success 17
specified through natural mortality rates, we obtained inference for unfished biomass, an important management reference point. Again, the DP mixture model resulted in flexible estimation for unfished biomass with appropriate quantification of the associated posterior uncertainty. Particular emphasis was placed on comparison of inference results with simpler parametric and semiparametric models from the fisheries literature. The DP mixture model outperformed those models for the North Atlantic cod data, based on both empirical (graphical) comparison and more formal comparison, using a minimum posterior predictive loss criterion. Although the need to model S-R relationships with time series methods has long been recognized (Walters, 1985) and related aproaches have been proposed (e.g., Millar and Meyer, 2000), others have noted that time series bias is small for productive stocks (Myers and Barrowman, 1995). Indeed, the majority of the fisheries literature continues to employ standard regression techniques. Our objective in this paper is to compare with this literature and to offer a more general modeling strategy. Current work seeks to develop structured semiparametric state-space S-R models as well as temporal extensions of the proposed fully nonparametric mixture model. A further extension of both methodological and practical interest involves building a nonparametric hierarchical model for the largely varying S-R relationships for cod over the North Atlantic regions. The hierarchical model must be sufficiently flexible to capture the diverse S-R relationships across the different regions, while incorporating available spatial information about neighboring regions. In this regard, an important covariate is sea surface temperature, which is measured at a grid within each region. Therefore, it can be incorporated in the model formulation by averaging over the whole region, or more generally, by integrating an underlying spatial model over the grid of each region. Results from this research will be reported in a future article. 18
ACKNOWLEDGEMENTS The work of A. Kottas and K. Fronczyk was supported in part by NSF grant DEB0727543; the work of S. Munch was supported in part by NSF grant DEB-0727312.
REFERENCES Barot, S., Heino, M., O’Brien, L., and Dieckmann, U. (2004). Long-term trend in the maturation reaction norm of two cod stocks. Ecological Applications 14, 1257-1271. Brander, K. and Mohn, R. (2004). Effect of the North Atlantic Oscillation on recruitment of Atlantic cod (Gadus morhua). Canadian Journal of Fisheries and Aquatic Sciences 61, 1558-1564. Brodziak, J.K.T., Overholtz, W.J., and Rago, P.J. (2002). Reply: Does spawning stock affect recruitment of New England groundfish? Interpreting spawning stock and recruitment data in New England groundfish. Canadian Journal of Fisheries and Aquatic Sciences 59, 193-195. Chen, D.G. and Ware, D.M. (1999). A neural network model for forecasting fish stock recruitment. Canadian Journal of Fisheries and Aquatic Sciences 56, 2385-2396. Clark, J.S. (2005). Why environmental scientists are becoming Bayesians. Ecology Letters 8, 2-14. Clark, J.S. (2007). Models for Ecological Data: An Introduction. Princeton University Press.
19
Cook, R.M. (1998). A sustainability criterion for the exploitation of North Sea cod. ICES Journal of Marine Science 55, 1061-1070. Eero, M., K¨oster, F.W., Plikshs, M. and Thurow, F. (2007). Eastern Baltic cod (Gadus morhua callarias) stock dynamics: extending the analytical assessment back to the mid-1940s. ICES Journal of Marine Science 64, 1257-1271. Evans, G.T. and Rice, J.C. (1988). Predicting recruitment from stock size without the mediation of a functional relation. Journal du Conseil - Conseil International pour l’Exploration de la Mer 44, 111-122. Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics 1, 209-230. Garcia, S.M., and de Leiva Moreno, J.I. (2003). Global Overview of Marine Fisheries. In Responsible Fisheries in the Marine Ecosystem, Sinclair, M., and Valdimarsson, G. (eds). FAO and CABI Publishing, pp. 1-24. Gelfand, A. and Ghosh, S. (1998). Model choice: A minimum posterior predictive loss approach. Biometrika 85, 1-11. Hilborn, R. and Walters, C.J. (1992).
Quantitative Fisheries Stock Assessment:
Choice, Dynamics, and Uncertainty. Chapman and Hall. New York. ICES (2005). Report of the ICES Advisory Committee on Fishery Management, Advisory Committee on the Marine Environment and Advisory Committee on Ecosystems, 2005. ICES Advice. Vol. 1 No. 11. Ishwaran, H. and James, L.F. (2001). Gibbs Sampling for Stick-Breaking Priors, Journal of the American Statistical Association 96, 161-73. 20
McAllister, M.K. and Kirkwood, G.P. (1998). Bayesian stock assessment: a review and example application using the logistic model. ICES Journal of Marine Science 55, 1031-1060. Millar, R.B., and Meyer, R. (2000). Non-linear state space modelling of fisheries biomass dynamics by using Metropolis-Hastings within-Gibbs sampling. Applied Statistics 49, 327-342. M¨ uller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika 83, 67-79. Munch, S.B., Kottas, A., and Mangel, M. (2005). Bayesian nonparametric analysis of stock-recruitment relationships. Canadian Journal of Fisheries and Aquatic Sciences 62, 1808-1821. Myers, R.A.M., and Barrowman, N.J. (1995). Time series bias in the estimation of density dependent mortality in stock-recruitment models. Canadian Journal of Fisheries and Aquatic Sciences 52, 223-232. Neal, R.M. (1998). Regression and classification using Gaussian process priors. In Bayesian statistics 6: Proceedings of the sixth Valencia international meeting (eds J.M. Bernardo, J.O. Berger, A.P. Dawid & A.F.M. Smith). Oxford University Press, pp. 475-501. Patil, A. (2007). Bayesian Nonparametrics for Inferences of Ecological Dynamics. Ph.D. Dissertation, Department of Applied Mathematics and Statistics, University of California, Santa Cruz. Punt, A. and Hilborn, R. (1997). Fisheries stock assessment and decision analysis: The Bayesian approach. Reviews in Fish Biology and Fisheries 7, 35-65. 21
Ricker, W.E. (1954). Stock and recruitment. Journal of the Fisheries Research Board 11, 559-623. Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica 4, 639-650. Stige, L.C., Ottersen, G., Brander, K., Chan, K.S., and Stenseth, N.C. (2006). Cod and climate: effect of the North Atlantic Oscillation on recruitment in the North Atlantic. Marine Ecology Progress Series 325, 227-241. Taddy, M. (2008). Bayesian Nonparametric Analysis of Conditional Distributions and Inference for Poisson Point Processes. Ph.D. Dissertation, Department of Applied Mathematics and Statistics, University of California, Santa Cruz. Taddy, M. and Kottas, A. (2009). A Bayesian nonparametric approach to inference for quantile regression. To appear in Journal of Business and Economic Statistics. Walters, C.J. (1985). Bias in the estimation of functional relationships from time series data. Canadian Journal of Fisheries and Aquatic Sciences 42, 147-149.
22
Icelandic
Irish Sea
-0.5
log(R/S)
-1.0
-0.5 -1.0
log(R/S)
1 -1
-2.0
-1.5
-1.5
0
log(R/S)
0.0
0.0
2
0.5
0.5
NE Arctic
4.0e+05
6.0e+05
8.0e+05
1.0e+06
1.2e+06
2e+05
4e+05
6e+05
8e+05
5.0e+03
1.0e+04
1.5e+04
S (1946-2004)
S (1955-2004)
S (1968-2004)
Faroe
Skagerrak
West of Scotland
3
2.0e+04
2 log(R/S)
1.5
log(R/S)
-2.5
0.0
0
0.5
1
1.0
-1.5 -2.0
log(R/S)
-1.0
2.0
-0.5
2.5
2.0e+05
2.0e+04
4.0e+04
6.0e+04
8.0e+04
S (1961-2004)
1.0e+05
1.2e+05
5.0e+04
1.0e+05
1.5e+05 S (1963-2004)
2.0e+05
2.5e+05
5.0e+03
1.0e+04
1.5e+04
2.0e+04
2.5e+04
3.0e+04
3.5e+04
S (1978-2004)
Figure 1: For each of the North Atlantic regions, posterior mean (solid lines) and 95% interval estimates (dashed lines) for the conditional mean log-reproductive success, overlaid on plot of the data for stock biomass (S) and log-reproductive success (log(R/S)). The label in each panel indicates for each region the time interval in years with available data.
23
0
1
2
3
-1
0
1
2
3
4 0
1
2
NE Arctic
3
4 3 1 0 -2
-2
-1
0
1
2
3
-2
-1
0
1
2
log(R/S)
Fixed x=7,500
Fixed x=12,500
Fixed x=20,000
Fixed x=30,000
2 log(R/S)
3
4
0
1
2
3
4
3 1 0
0
log(R/S)
2
West of Scotland
3 0
1
2
West of Scotland
3 0
1
2
West of Scotland
3 2 1
1
3
4
log(R/S)
4
log(R/S)
4
log(R/S)
0
0
Fixed x=900,000
2
NE Arctic
3 0
1
2
NE Arctic
3 2
NE Arctic
1 0
-1
4
-2
West of Scotland
Fixed x=600,000
4
Fixed x=350,000
4
Fixed x=200,000
1
2 log(R/S)
3
4
0
1
2
3
4
log(R/S)
Figure 2: For the NE Arctic region (top row) and the West of Scotland region (bottom row), posterior mean (solid lines) and 95% interval estimates (dashed lines) of the conditional density of log-reproductive success (log(R/S)) given four fixed stock biomass values (indicated in the label of each panel).
24
Icelandic
4e-04
1.5e-05 2.0e+05
4.0e+05
6.0e+05
8.0e+05
1.0e+06
1.2e+06
0e+00
2e+05
4e+05
6e+05
8e+05
1e+06
0.0e+00
5.0e+03
1.0e+04
1.5e+04
Bo
Bo
Bo
Faroe
Skagerrak
West of Scotland
0e+00
1.0e-04
0e+00
0.0e+00
1e-05
2e-05
2e-05
2.0e-04
3e-05
6e-05 4e-05
2.0e+04
3.0e-04
4e-05
0.0e+00
0e+00
0.0e+00
1e-04
5.0e-06
2e-04
3e-04
1.0e-05
1.0e-05 5.0e-06 0.0e+00
Irish Sea 5e-04
1.5e-05
NE Arctic
0.0e+00
2.0e+04
4.0e+04
6.0e+04
8.0e+04
1.0e+05
1.2e+05
0.0e+00
Bo
5.0e+04
1.0e+05
1.5e+05 Bo
2.0e+05
2.5e+05
0e+00
1e+04
2e+04
3e+04
4e+04
Bo
Figure 3: For each of the North Atlantic regions, posterior mean (solid lines) and 95% interval estimates (dashed lines) for the density of unfished biomass (B0 ) corresponding to natural mortality rate M = 0.2.
25
4.0e+05
6.0e+05
8.0e+05
1.0e+06
1.2e+06
2 -1
2.0e+05
4.0e+05
6.0e+05
8.0e+05
1.0e+06
1.2e+06
2.0e+05
6.0e+05
8.0e+05
S
Irish Sea Ricker Model
Irish Sea GP model
Irish Sea DP model
1.5e+04
2.0e+04
S
0.5 -0.5
log(R/S)
-2.5
-2.0
-1.5
-1.0
0.0 -0.5
log(R/S)
-1.0 -1.5 -2.0 -2.5
1.0e+04
1.2e+06
0.0
0.5
1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0
5.0e+03
1.0e+06
1.0
S
-2.5
0.0e+00
4.0e+05
S
1.0
2.0e+05
-2
-2
-1
0
log(R/S)
1 0
log(R/S)
1
2
3 2 1 log(R/S) 0 -1 -2
0.0e+00
log(R/S)
NE Arctic DP model 3
NE Arctic GP model 3
NE Arctic Ricker Model
5.0e+03
1.0e+04
1.5e+04 S
2.0e+04
5.0e+03
1.0e+04
1.5e+04
2.0e+04
S
Figure 4: For the NE Arctic region (top row) and the Irish Sea region (bottom row), posterior mean (solid lines) and 95% interval estimates (dashed lines) for the S-R regression curve based on the parametric Ricker model (left column), the semiparametric Gaussian process model (middle column), and the DP mixture model (right column). Each panel includes a plot of the corresponding data for stock biomass (S) and logreproductive success (log(R/S)).
26
Table 1: For the NE Arctic and Irish Sea regions, estimates of the goodness-of-fit term, G(m), penalty term, P (m), and the posterior predictive loss measure, D(m), under the parametric model (“Ricker”), the semiparametric model (“GP”), and the nonparametric mixture model (“DP”). NE Arctic Ricker
GP
G(m)
25.174
20.366
P (m)
43.551
D(m) 68.725
Irish Sea DP
GP
DP
23.031 21.951
6.529
11.673
54.979
26.159 34.259
28.833
12.571
75.345
49.190 56.210
35.362
24.244
27
Ricker
Appendix A: Prior specification and posterior simulation for the Dirichlet process mixture model Here, we present the details of the Markov chain Monte Carlo (MCMC) posterior simulation algorithm for the Dirichlet process (DP) mixture model presented in Section 3.1 of the paper. An approach to prior specification for the model hyperparameters is also discussed.
Prior specification: The approach taken to specifying the priors for the DP hyperparameters is to select hyperprior parameter values such that the resulting mixture covers the support of the underlying distribution. In particular, the approach is based on a small amount of prior information, using rough prior guesses at the center, say, cx and cy , and range, say, rx and ry , of stock biomass and log-reproductive success values, respectively. Let R be the 2 × 2 diagonal matrix with diagonal elements (ry /4)2 and (rx /4)2 , which are rough prior estimates for the variability in log-reproductive success and biomass values. For a default prior specification method, we consider a single component of the mixture model, N2 (·; µ, Σ), which is the limiting case of the DP mixture for α → 0+ . Under this version of the model, the marginal prior mean and covariance matrix for the data are given by am and (v − 3)−1 aQ BQ + (aV − 3)−1 BV + Bm , respectively. Hence, we set am = (cy , cx )T , and use matrix R to specify each of the components in the prior covariance above. To this end, the degrees of freedom, v, of the inverse Wishart distribution of Σ in G0 , and the corresponding parameters aV and aQ in the priors of V and Q are set at twice the dimension of the data vector, i.e., at 4. Note that 4 is the integer value of aV that yields finite expectation, and at the same time, the largest possible dispersion in the prior for V . Moreover, smaller values of (v − 3)−1 aQ result in more dispersed priors for Q. Finally, R is split evenly between 28
the three marginal prior covariance components to determine the diagonal matrices BQ , BV , and Bm . The DP prior precision parameter, α, controls the number, n∗ , of distinct mixture components (e.g., Escobar and West, 1995). In particular, for moderate to large sample sizes, a useful approximation to the prior expectation E(n∗ | α) is given by α log{(α + n)/α}. This expression can be averaged over the gamma(aα , bα ) prior for α to obtain E(n∗ ), thus selecting aα and bα to agree with a prior guess at the expected number of distinct mixture components. For instance, for the NE Arctic region cod data, we set am = (0.4, 3.5e05)T , BQ = diag(0.13, 6.3e09), BV = diag(0.52, 2.52e10), and Bm = diag(0.52, 2.52e10). Moreover, we used a gamma(4, 2) prior for α, implying E(n∗ ) ≈ 7.
MCMC Posterior inference: To sample from the posterior of the DP mixture model, p(p, θ, L, α, m, V, Q | data), we use a version of the blocked Gibbs sampler (Ishwaran and Zarepour, 2000; Ishwaran and James, 2001). Let zi = (yi , xi ), i = 1, . . . , n, and denote the n∗ distinct values in the vector of configuration variables, L = (L1 , . . . , Ln ), by L∗1 , . . . , L∗n∗ . Moreover, let Mj∗ = |{i : Li = L∗j }|, j = 1, . . . , n∗ , and Ml = |{Li : Li = l}|, l = 1, . . . , N . The updates of the θ l = (µl , Σl ), l = 1, ..., N , depend of the value of l. Specifically, for any l ∈ / {L∗j : j = 1, . . . , n∗ }, we have p(θ l | L, m, V, Q, data) = g0 (θ l ; m, V, Q), where g0 is the density of the DP prior base distribution G0 . Thus, in this case, we draw µl | m, V ∼ N2 (m, V ) and Σl | Q ∼ IW(v, Q). For l = L∗j , j = 1, . . . , n∗ , p(θ L∗j | L, m, V, Q, data) = g0 (θ L∗j ; m, V, Q)
Y
N2 (zi ; θ L∗j ).
{i:Li =L∗j }
Therefore, for any j = 1, ..., n∗ , we extend the Gibbs sampler to draw from the posterior 29
full conditionals for µL∗j and ΣL∗j :
−1 −1 µL∗j | m, V, ΣL∗j , data ∼ N2 (V −1 + Mj∗ Σ−1 m + Σ−1 L∗j ) (V L∗j
X
−1 zi ), (V −1 + Mj∗ Σ−1 L∗j )
{i:Li =L∗j }
X
ΣL∗j | Q, µL∗j , data ∼ IW v + Mj∗ , Q +
(zi − µL∗j )(zi − µL∗j )T .
{i:Li =L∗j }
The posterior full conditional for each Li is proportional to N2 (zi ; θ Li ) P resulting in a discrete distribution, N ˜li δl (·), with updated weights l=1 p p˜li = PN
pl N2 (zi ; θ l )
m=1
pm N2 (zi ; θ m )
PN
l=1
pl δl (·),
, l = 1, ..., N.
The posterior full conditional for p is proportional to f (p; α)
n X N Y
pl δ(Li ) = f (p; α)
i=1 l=1
N Y
l pM l .
l=1
Here, f (p; α) is the joint prior for p, conditionally on α, induced by the truncated stick-breaking construction given in Section 3.1 of the paper. Specifically, f (p; α) corresponds to a special case of the generalized Dirichlet distribution, f (p; α) =
αN −1 pα−1 N (1
−1
−1
− p1 ) (1 − (p1 + p2 ))
× · · · × (1 −
N −2 X
pl )−1
l=1
(Connor and Mosimann, 1969). Hence, up to its normalizing constant, the full conditional for p can be written as (M1 +1)−1
p1
(M
N −1 × · · · × pN −1
PN
+1)−1 (α+MN )−1 pN (1
PN
− p1 )(α+ l=2 Ml )−[(M2 +1)+(α+ l=3 Ml )] × XN −2 · · · × (1 − pl )(α+MN −1 +MN )−[(MN −1 +1)+(α+MN )] l=1
and thus, can be recognized as a generalized Dirichlet distribution with parameters P PN (M1 + 1, M2 + 1, . . . , MN −1 + 1) and (α + N k=2 Mk , α + k=3 Mk , . . . , α + MN ). Using the constructive definition of the generalized Dirichlet distribution, the vector p can 30
be generated by drawing latent V1∗ , . . . , VN∗ −1 , where the Vl∗ , l = 1, ..., N − 1, are Ql−1 P ∗ ∗ independent Beta(Ml +1, α+ N k=l+1 Mk ), and then setting p1 = V1 ; pl = Vl m=1 (1− P −1 Vm∗ ), l = 2, . . . , N − 1; and pN = 1 − N l=1 pl . Finally, standard updates can be used for the DP hyperparameters (m, V, Q) and α. Denote their corresponding priors (discussed in Section 3.1 of the paper) by π(m), π(V ), π(Q) and π(α). Then, the joint full conditional for the DP base distribution hyperparameters is given by ∗
p(m, V, Q | θ, L, data) ∝ π(m)π(V )π(Q)
n Y
g0 (θ L∗j ; m, V, Q).
j=1
Thus, m has a bivariate normal posterior full conditional with mean vector P ∗ −1 −1 −1 −1 −1 (n∗ V −1 + Bm ) (V −1 nj=1 µL∗j + Bm am ) and covariance matrix (n∗ V −1 + Bm ) . Also, the full conditional for V is an inverse Wishart with aV + n∗ degrees of freedom P ∗ and scale matrix given by BV + nj=1 (µL∗j − m)(µL∗j − m)T . The last hyperparameter, Q, has a Wishart posterior full conditional with degrees of freedom aQ + n∗ v and scale P ∗ −1 −1 matrix (BQ + nj=1 Σ−1 L∗j ) . Moreover, p(α | p) ∝ π(α)f (p; α) ∝ α(N +aα −1)−1 e−α(bα −log(pN )) , i.e., the posterior full conditional for α is given by a gamma distribution. Note that the rate parameter involves the last random weight, which may be very small causing numerical issues in the algorithm. This problem is avoided by replacing pN with its defining Beta draws, Q −1 ∗ pN = N m=1 (1 − Vm ). Therefore, α is updated from a gamma distribution with shape P −1 ∗ parameter aα + N − 1 and rate parameter bα − N m=1 log(1 − Vm ).
Estimation of the model comparison criterion: For formal comparison of the three models considered in Section 4.2 of the paper, we use a minimum posterior predictive loss criterion (Gelfand and Ghosh, 1998). The predictive loss measure, D(m), 31
under model m is computed as D(m) =
n X
Var(m) (ynew,i | data) +
n X
i=1
{yi − E(m) (ynew,i | data)}2 .
i=1
Here, estimation of E(m) (ynew,i | data) and Var(m) (ynew,i | data) requires expectations with respect to the posterior predictive distribution, under model m, for replicated response ynew,i with associated stock biomass value xi . Recall from Section 3.2 of the paper the notation for partitioning the mean vector and covariance matrix of the N2 (y, x; µl , Σl ) distribution. Then, under the DP mixture model, the expressions for the first and second posterior predictive moments given each stock biomass value, xi , i = 1, . . . , n, are as follows: −1
Z
E(y | xi , data) = {p(xi | data)}
−1
= {p(xi | data)}
yp(y, xi | data)dy Z X N
y yx xx−1 pl N(xi ; µxl , Σxx (xi − µxl )} × l ){µl + Σl Σl
l=1
p(θ, p | data)dθdp E(y 2 | xi , data) = {p(xi | data)}−1 −1
= {p(xi | data)}
Z
y 2 p(y, xi | data)dy
Z X N
y yx xx−1 pl N(xi ; µxl , Σxx (xi − µxl )}2 + l )[{µl + Σl Σl
l=1 yx xx−1 xy Σl ]p(θ, p | data)dθdp Σyy l − Σl Σl
where p(xi | data) =
R PN
l=1
(m) pl N(xi ; µxl , Σxx (ynew,i | data) l )p(θ, p | data)dθdp. The E
are obtained from Monte Carlo integration of E(y | xi , data) and the Var(m) (ynew,i | data) are calculated with Monte Carlo integration of E(y 2 | xi , data)−{E(y | xi , data)}2 .
32
Appendix B: Posterior simulation for the parametric and semiparametric models of the comparison study Here, we describe MCMC fitting for the parametric Ricker model and the semiparametric Gaussian process (GP) model used in the comparison study presented in Section 4.2 of the paper.
Parametric model: For the Ricker model, denote by a0 and b0 the mean and variance of the normal prior for β0 , and by aσ and bσ the shape and rate parameters of the inverse gamma prior for σ 2 . The model can be fitted with an MCMC algorithm based on P P a normal full conditional for β0 with mean (b0 ni=1 yi + b0 β1 ni=1 Si + a0 σ 2 )/(nb0 + σ 2 ) and variance (b0 σ 2 )/(nb0 + σ 2 ), and an inverse gamma full conditional for σ 2 with P shape parameter aσ + 0.5n and rate parameter given by bσ + 0.5 ni=1 (yi − β0 + β1 Si )2 . Moreover, we update β1 with a random walk Metropolis-Hastings step, using a normal proposal distribution on the logarithmic scale. The elements of the predictive loss criterion, E(m) (ynew,i | data) and Var(m) (ynew,i | data), i = 1, . . . , n (see Section 4.2 of the paper and Appendix A), are estimated using draws from the posterior predictive distribution at each Si , i = 1, . . . , n, which R is given by p(ynew,i | data) = N(ynew,i ; β0 − β1 Si , σ 2 )p(β0 , β1 , σ 2 | data) dβ0 dβ1 dσ 2 . Each of these predictive distributions is readily sampled using the posterior draws for (β0 , β1 , σ 2 ).
Semiparametric model: Turning to the GP regression model, let N(m, s2 ) be the prior for µ, denote by inverse-gamma(aτ , bτ ) and inverse-gamma(aσ , bσ ) the priors for τ 2 and σ 2 , respectively, and let (0,bφ ) be the support of the uniform prior for φ. The mean of the normal prior for µ is set at a prior guess on the center of the log-reproductive 33
success values, and the variance is set equal to 10(ry /4)2 , where ry is a prior guess at the range of the log-reproductive success values. We also set aτ = aσ = 2 resulting in inverse gamma priors for τ 2 and σ 2 with infinite prior variances. The rate parameters of both priors are also specified through 10(ry /4)2 . Under the exponential correlation function for the GP prior, 3/φ is the range of dependence, that is, the distance between stock biomass values, d = |S − S 0 |, such that the correlation between h(S) and h(S 0 ) is approximately 0.05. The prior for φ is based on the maximum difference between stock biomass values, dmax = max |S − S 0 |, choosing bφ such that 0.01dmax = 3/bφ . Regarding posterior simulation, the model can be fitted with a Gibbs sampler based on standard updates for all parameters except φ. Let y = (y1 , ..., yn ), θ be the ndimensional vector with θi = h(Si ), i = 1, . . . , n, In the identity matrix of dimension n, and 1n the vector of dimension n with each of its element equal to 1. Then, the posterior full conditionals are given by: θ | µ, σ 2 , τ 2 , φ, data ∼ Nn (σ −2 In + C −1 )−1 (σ −2 y + µC −1 1n ), (σ −2 In + C −1 )−1 T −1 1n C θ + ms−2 1 2 , µ | θ, τ , φ ∼ N 1Tn C −1 1n + s−2 1Tn C −1 1n + s−2 Xn (yi − θi )2 σ 2 | θ, data ∼ inverse-gamma aσ + 0.5n, bσ + 0.5
i=1
τ 2 | θ, µ, φ ∼ inverse-gamma aτ + 0.5n, bτ + 0.5(θ − µ1n )T H −1 (θ − µ1n ) φ | θ, µ, τ 2 ∝ |C|−1/2 exp{−0.5(θ − µ1n )T C −1 (θ − µ1n )}1(0 < φ < bφ ) where H is the observed correlation matrix with elements Hij = exp{−φ|Si − Sj |}, and C = τ 2 H is the observed covariance matrix. The parameter φ is updated with a Metropolis-Hastings step using a normal proposal distribution on the logarithmic scale. Posterior predictive inference for the S-R relationship, h(S), is obtained through the n values in vector θ augmented with a set of M new stock biomass values, S˜ = ˜ = (θ˜1 , ..., θ˜M ), where θ˜j = h(S˜j ), j = 1, ..., M . The vector θ, ˜ (S˜1 , . . . , S˜M ). Let θ 34
conditionally on θ and the GP hyperparameters, follows an M -variate normal distribution with mean vector (µ1M + C M n C −1 (θ − µ1n )) and covariance matrix C M M − C M n C −1 (C M n )T , where CijM n = τ 2 exp{−φ|S˜i − Sj |}, and CijM M = τ 2 exp{−φ|S˜i − S˜j |}. ˜ at each iteration Therefore, a posterior realization for h(S) is obtained through {θ, θ} of the MCMC using the currently imputed parameter values. The E(m) (ynew,i | data) and Var(m) (ynew,i | data), i = 1, . . . , n, needed to calculate the posterior predictive loss criterion (see Section 4.2 of the paper or Appendix A), can be computed through the mean and variance, respectively, of replicated logreproductive success values corresponding to each observed stock biomass level. For each observed stock biomass value Si , i = 1, ..., n, the replicated responses are sampled from the associated posterior predictive distribution given by p(ynew,i | data) = R N(ynew,i ; θi , σ 2 )p(θi , σ 2 | data) dθi dσ 2 .
35
REFERENCES Connor, R.J. and Mosimann, J.E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the American Statistical Association 64, 194-206. Escobar, M. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, 577-588. Gelfand, A. and Ghosh, S. (1998). Model choice: A minimum posterior predictive loss approach. Biometrika 85, 1-11. Ishwaran, H. and James, L.F. (2001). Gibbs Sampling for Stick-Breaking Priors. Journal of the American Statistical Association 96, 161-73. Ishwaran, H. and Zarepour, M. (2000). Markov Chain Monte Carlo in Approximate Dirichlet and Beta Two-Parameter Process Hierarchical Models. Biometrika 87, 371-390.
36