RACSAM (2013) 107:459–473 DOI 10.1007/s13398-012-0072-8 ORIGINAL PAPER
Bayesian inference for controlled branching processes through MCMC and ABC methodologies Miguel González · Cristina Gutiérrez · Rodrigo Martínez · Inés M. del Puerto
Received: 18 February 2012 / Accepted: 5 June 2012 / Published online: 24 June 2012 © Springer-Verlag 2012
Abstract The controlled branching process (CBP) is a generalization of the classical Bienaymé–Galton–Watson branching process, and, in the terminology of population dynamics, is used to describe the evolution of populations in which a control of the population size at each generation is needed. In this work, we deal with the problem of estimating the offspring distribution and its main parameters for a CBP with a deterministic control function assuming that the only observable data are the total number of individuals in each generation. We tackle the problem from a Bayesian perspective in a non parametric context. We consider a Markov chain Monte Carlo (MCMC) method, in particular the Gibbs sampler and approximate Bayesian computation (ABC) methodology. The first is a data imputation method and the second relies on numerical simulations. Through a simulated experiment we evaluate the accuracy of the MCMC and ABC techniques and compare their performances. Keywords Controlled branching process · Bayesian inference · Gibbs sampler · Approximate Bayesian computation · Non-parametric
This research was supported by the Ministerio de Ciencia e Innovación and the FEDER through the grants MTM2009-13248. M. González · C. Gutiérrez · R. Martínez · I. M. del Puerto (B) Department of Mathematics, University of Extremadura, Badajoz, Spain e-mail:
[email protected] M. González e-mail:
[email protected] C. Gutiérrez e-mail:
[email protected] R. Martínez e-mail:
[email protected]
460
M. González et al.
1 Introduction The behaviour of the long-time evolution of many populations depends on the conditions on the offspring reproduction model and on the span life of their individuals. The theory of branching processes was developed to deal with these models. The terminology Branching Process was introduced by Kolmogorov in the 1940s of the twentieth century but the subject is much older and goes back to more than a century and a half ago. Initially it was motivated to explain the extinction phenomenon of certain family lines of European aristocracy and names like Bienaymé, Galton and Watson are linked to the former studies. Nowadays, this kind of processes are treated extensively for their mathematical interest and as theoretical approaches to solving problems in applied fields such as Biology (gene amplification, clonal resistance theory of cancer cells, polymerase chain reactions, etc.), Epidemiology, Genetics, and Cell Kinetics (the evolution of infectious diseases, sex-linked genes, stem cells, etc.), Computer Algorithms and Economics, and, of course, Population Dynamics, to mention only some of the more important applications. In particular, in this work we are interested in the class of controlled branching processes. These are discrete-time stochastic processes that model population developing in the following manner: at generation 0, the population consists of a fixed number of individuals or progenitors, each of them, independently of the others and in accordance with a common probability distribution gives rise to offspring, and then ceases to participate in subsequent reproduction processes. Thus, each individual lives for one unit of time and is replaced by a random number of offspring. Moreover, as for various reasons of an environmental, social, or other nature, the number of progenitors which take part in each generation must be controlled, a control mechanism is introduced in the model to determine the number of offspring in each generation with reproductive capacity and so on. Mathematically, a controlled branching process (CBP), {Z n }n≥0 , is defined recursively as Z 0 = N ≥ 1,
Z n+1 =
φ(Z n )
X ni , n = 0, 1, . . . ,
(1)
i=1
where {X ni : i = 1, 2, . . . , n = 0, 1, . . .} is a sequence of independent and identically distributed non–negative integer-valued random variables and φ is a function assumed to be nonnegative and integer-valued for integer-valued arguments. The empty sum in (1) is defined to be 0. Intuitively, Z n denotes the number of individuals in generation n, and X ni the number of offspring of the ith individual in generation n. Thus, the probability law { pk }k≥0 is called the offspring distribution or reproduction law, and m and σ 2 , respectively, the offspring mean and variance (both assumed finite). Respect to the control function φ, if φ(Z n ) < Z n , then Z n − φ(Z n ) individuals are artificially removed from the population, and therefore do not participate in the future evolution of the process. If φ(Z n ) > Z n , then φ(Z n ) − Z n new individuals of the same type are added to the population participating under the same conditions as the others. No control is applied to the population when φ(Z n ) = Z n . It is easy to see that {Z n }n≥0 is a homogeneous Markov chain. Moreover assuming that φ(0) = 0 and p0 > 0, the classical duality in branching process theory, extinction–explosion holds, i.e., P(Z n → 0) + P(Z n → ∞) = 1. For this process, theoretical considerations have been tackled in several papers, see for example, [1,13,14,24,30,35]. Various extensions have been considered. It is worthy highlighting the controlled branching processes with random control function [7,8,15–20,25,34]. For these processes, for each n = 0, 1, . . ., independent stochastic processes {φn (k)}k≥0 are
Bayesian inference for controlled branching processes
461
introduced, with equal one-dimensional probability distributions and independent of the offspring distribution. Thus, in each generation n = 1, 2, . . ., with size Z n , the number of progenitors for the next generation is determined by the random variable φn (Z n ). It is clear that the CBP (1) is a particular case of a CBP with random control function. Few of the quoted references deal with statistical issues. In addition most of those that do it are focussed on a frequentist viewpoint. Indeed, the only paper that considers a Bayesian outlook is [25]. In particular it considers a CBP with random control function with offspring distribution belonging to the power series family and establishes the asymptotic normality of the posterior distribution of the offspring mean. It is the aim of this paper to develop the Bayesian inference theory, in a nonparametric scenario, in the class of CBP with a deterministic control function. Mainly, inference on the offspring distribution and on its main moments is developed. The branching process theory usually has assumed that the entire family tree is needed to be observed in order to make inferences, in the nonparametric framework, on the offspring distribution (see [21,23]). As novelty, we consider in this paper that the sample one can observe is given by only the generation-by-generation population size. In this case, the problem can be dealt as an incomplete data problem. Then to make inference based on this sample we first consider a traditional Markov chain Monte Carlo (MCMC) method, which is the Gibbs sampler. After that, approximate Bayesian computation (ABC) methods are considered. ABC methods are being developed during last decade as an alternative to such more traditional MCMC methods. These likelihood-free techniques are very well-suited to models for which the likelihood of the data are either mathematically or computationally intratable but it is easy to simulate from them, so that they are very appropriate for studying the inference of CBPs. Through a simulated example we first evaluate the accuracy of the MCMC to estimate the reproduction law and the offspring mean and variance. To this end we rely on well-known theoretical convergence results as well as on practical procedures for checking numerical convergence for MCMC methods. Later we analyze whether ABC techniques can provide accurate estimates of the parameters of interest. In the literature, no methods are available to evaluate the approximations given by ABC methodology. Thus, the performance indicator we consider for ABC procedure is its ability to provide posterior probability distribution close enough to the ones given by MCMC (as was also proposed in [5]). Mainly, we evaluate the accuracy of the ABC methods by focussing on the performance of the estimate of the offspring mean. The paper is organized as follows. Section 2 describes an algorithm based on Gibbs sampler and Sect. 3 is devoted to ABC methods. We describe the rejection ABC algorithm and two post-processing schemes that change the analysis of the ABC output. Section 4 illustrates with a simulated example the described algorithms and compares their performances. Throughout the paper, we shall assume a CBP with the control function φ known and with an offspring distribution p = { pk } whose support, S, is assumed to be finite and also known for simplicity, let denote its cardinal by s. Let denote Zn = {Z 0 , . . . , Z n }, the prior probability density for p by π( p) and by π( p | Zn ) the posterior distribution of p after observing Zn . 2 Gibbs sampler We shall describe an algorithm based on the Gibbs sampler (see, for instance [6]) to approximate the posterior offspring distribution of a CBP only by observing Zn and considering the control function φ to be known.
462
M. González et al.
Let us introduce Z l (k), k ∈ S , l = 0, 1, . . . , n − 1, as the random variable that represents the number of individuals at the lth generation with exactly k offspring and denote Zn∗ = {Z l (k), k ∈ S , l = 0, 1, . . . , n − 1}. Mathematically, Z l (k) =
φ(Z l )
I{X l j =k} ,
j=1
with I A standing for the indicator function of the set A. Taking advantage of the conjugate theory, let consider that the prior distribution of p is a Dirichlet distribution with parameter α, being α = (αk , k ∈ S), αk > 0, i.e. −1 α −1 k pk , with d(α) = (α∗ ) (αk ) , π( p) = d(α) k∈S
k∈S
being α∗ = k∈S αk and (·) denoting the Gamma function. It is easy to check that the likelihood function based on Zn∗ is proportional to ∗ Z n,k pk , (2) k∈S
n−1
∗ = with Z n,k l=0 Z l (k). We consider the unobservable variables Z l (k), k ∈ S , ł = 0, 1, . . . , n − 1, as latent variables and the augmented parameter vector ( p, Zn∗ ). We shall approximate the posterior distribution of ( p, Zn∗ ) after observing Zn , denoted by π( p, Zn∗ | Zn ), and from this obtain an approximation for π( p | Zn ). In order to sample the posterior distribution π( p, Zn∗ | Zn ), taking into account the Gibbs sampler algorithm, it is only necessary to determine the conditional posterior distribution of p after observing Zn and Zn∗ , denoted by π( p | Zn , Zn∗ ), and the conditional posterior distributions of Zn∗ after observing ( p, Zn ), denoted by f (Zn∗ | p, Zn ). Taking into account that Z l (k) and Z l+1 = k Z l (k) l = 0, . . . , n − 1, (3) φ(Z l ) = k∈S
π( p |
Zn , Zn∗ )
k∈S
is the same as π( p |
Zn∗ ).
Now, from (2) it is deduced that β −1 π( p | Zn∗ ) ∝ d(β) pk k ,
(4)
k∈S ∗ and β = (β , k ∈ S ). with βk = αk + Z n,k k With respect to f (Zn∗ | p, Zn ), since the individuals reproduce independently, one has that n−1 f (Zn∗ | p, Zn ) = f ({Z l (k) : k ∈ S }| p, Z l , Z l+1 ), l=0
with f ({Z l (k) : k ∈ S }| p, Z l , Z l+1 ) denoting the conditional distribution of the random vector (Z l (k); k ∈ S ), given p, Z l , and Z l+1 (the proof follows similar steps to that developed in the Appendix of [12]). Moreover it is easy to see that the distribution of {Z l (k) : k ∈ S } given p, Z l and Z l+1 is obtained from a multinomial distribution with size φ(Z l ) and probabilities given by p, normalized by considering the constraint Z l+1 = k∈S k Z l (k). Once it is known how to obtain samples from the distributions π( p | Zn , Zn∗ ) and f (Zn∗ | p, Zn ), the Gibbs sampler algorithm works in the following way:
Bayesian inference for controlled branching processes
463
Gibbs sampler algorithm p (0)
generate ∼ π( p) do i = 1 ∗(i) generate Zn ∼ f (Zn∗ | Zn , p (i−1) ) ∗(i) generate p (i) ∼ π( p | Zn ) do i = i + 1 ∗(l)
Thus, for a run of the sequence {( p (l) , Zn )}l≥0 , we choose Q +1 vectors p (N ) , p (N +G) , . . ., p (N +QG) , with N , G, Q > 0. These vectors are approximately independent sampled values of the distribution π( p | Zn ) if G and N are large enough (see details in [32]). Since these vectors could be affected by the initial state p (0) , we apply the algorithm T times, obtaining a final sample of length T (Q + 1).
3 Approximate Bayesian computation The use of ABC ideas initially come from the field of population genetics, although these were quickly extended to a great variety of scientific applications areas. The basic ideas are to simulate a large number of data from a model depending on a parameter that is drawn from a prior distribution and to calculate for each simulated data summary statistics that are compared with the values for the observed sample. An approximate sample from the posterior distribution is given by the parameters that provide summary statistics close enough to summary statistics of the observed sample. That is, the aim of the ABC methodology is to provide samples from a posterior distribution which is a good (enough) approximation of the target distribution. A survey on ABC algorithms can be read in [22]. For a CBP, given an specific p, it is easy to simulate the entire family tree up to the current nth generation and then samples from the random variable {Z l (k), k ∈ S , l = 0, 1, . . . , n−1} and from (3) to obtain the population size in each generation. Let denote f (Zn∗ | p) the probability density of Zn∗ given p. As before Zn = {Z 0 , . . . , Z n } and consider the observed data from the model (1) Znobs = {Z 0obs , . . . , Z nobs }. We describe the rejection ABC algorithm and two post-processing schemes to obtain good approximations to π( p | Zn ). 3.1 Likelihood-free sampler: tolerance rejection algorithm This algorithm is an adaptation of that proposed in [28]. It is based on summary statistics,
S (·), calculated for Zn∗ , and on a distance, ρ(·, ·), on S (Zn∗ ).
For a given ε > 0, known as a tolerance level, the algorithm proposed below provides samples from π( p | ρ(S (Zn ), S (Znobs )) ≤ ε) which allow us to obtain a good approximation to π( p | Znobs ) by using a small enough ε and good choices of S (·) and ρ(·, ·). Taking into account that the available sample is the total population size in each generation and that our aim is to obtain an approximation of π( p | Zn ), we consider S (Zn∗ ) = Zn , that is, from a simulation of the entire family tree up to generation n we keep the total population size in each generation of such simulate data. Indeed we implicity assume that we work with the entire observed data, Zn , rather than building reduced dimension summary statistics of them as is usually done in practice. This assumption is also considered in a Markovian context in [9]. The search of sufficient statistics for p which guarantee π( p | Zn ) = π( p | S (Z n )) is difficult.
464
M. González et al.
Several metrics can be proposed to evaluate when the simulated data match the observed data. The more intuitive and usual ones: – 1 - metric n |Z iobs − Z i | ρ 1 Zn , Znobs = i=1
– Euclidean metric:
ρe Zn , Zn
obs
n = (Z iobs − Z i )2
1/2
i=1
Also we consider the Hellinger’s metric, defined as, n
1/2 1/2 2 obs obs 1/2 (Z i ) − Z i ρh Zn , Zn = i=1
This was first considered in [3] to deal with the efficiency and robustness properties in parametric estimation problems. Finally, due to the aspect of our particular data we also could consider the following metric: n n n obs Z obs 1 i=1 Z Z Z j j i i=1 i obs + ρs Zn , Zn = n − n − . n n obs obs 2 Z Z Z Z i i i=1 i=1 i i=1 i=1 i j=1
This metric is a modification of that proposed in [26] in order to obtain the symmetric property. The first summand measures the difference between the total progenie in the sample and simulated data, whereas the second summand is the total variation distance between the two proportions of individuals in each generation respect to the total progenie in the simulated sample and in the observed data. Thinking in the estimation of the offspring probability, this metric considers as valid simulated data those that provide similar progenies to the observed data and small total variation distance between the two vectors of proportions. Let us formulate the tolerance rejection ABC algorithm by considering a generic metric ρ(·, ·). Likelihood-free sampler: Tolerance rejection algorithm for i = 1 to m do repeat generate p ∼ π( p) n∗ from the likelihood f (Zn∗ | generate Z p) obs n , Zn ) ≤ ε, with Z n = S (Z n∗ ) until ρ(Z set p (i) = p end for In Sect. 4 we compare the effect of the metrics through a simulated example. 3.2 Likelihood-free sampler: local linear regression algorithm In [2] it is proposed an extension of the rejection ABC algorithm. This is based on a locallinear regression fitting of simulated parameters on simulated summary statistics, and then
Bayesian inference for controlled branching processes
465
to use this to predict the true parameter values by substituting the observed sample into the regression equation. Let develop the algorithm: suppose that we have simulated independent ∗(i) pairs ( p (i) , Zn ), i = 1, . . . , m, and we calculate on the simulated data the summary statis(i) tic, S (·), described in the previous section, obtaining the independent pairs ( p (i) , Zn ), i = 1, . . . , m. It is considered the following regression model: ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
⎞
⎛
⎞⎛
⎞ α ⎟ ⎜ ⎟⎜ ⎟ ⎛ E ⎞ ⎟ ⎜ ⎟ 1 β1 ⎟ p (2) ⎟ ⎜ 1 Z 1(2) − Z 1obs . . . Z n(2) − Z nobs ⎟ ⎜ ⎟ ⎜ E2 ⎟ ⎟ ⎜ ⎟⎜ ⎜ ⎜ ⎟ ⎟=⎜ ⎟⎜ ⎟ + ⎜ . ⎟ ⎜. ⎟⎜ . ⎟ ⎝ . ⎟ .. ⎟ . . . ⎜ ⎟ ⎜ .. ⎟ . ⎠ .. .. .. . ⎟ ⎟ ⎜ .. ⎟⎝ ⎠ Em ⎠ ⎝ ⎠ (m) (m) βn p (m) 1 Z 1 − Z 1obs . . . Z n − Z nobs p (1)
1 Z 1(1) − Z 1obs . . . Z n(1) − Z nobs
(5)
being α and βi , i = 1, . . . , m, s-dimensional parameter row vectors (recall s is the dimension of the support of the offspring distribution) and the s-dimensional row random vectors Ei , i = 1, . . . , m, are formed by zero-mean, uncorrelated and constant variance random variables. Equivalently (5) is written as P = DB + E (i)
with P = ( p (1) , . . . , p (m) )t , D = (di j )i∈{1,...,m}, j∈{1,...,n+1} , being di,1 = 1, di, j = Z j−1 − t t Z obs j−1 , j = 2, . . . , n + 1, i = 1, . . . , m, B = (α, β1 , . . . , βn ) and E = (E1 , . . . , Em ) . The linearity and addition assumptions in (5) will often be implausible, however these can be applied locally. Thus, the estimation of B is obtained by minimizing the weighted least-squares criterion m
p (i) − Di. B
2
Wi ,
(6)
i=1
where Di. is the ith row of the matrix D and Wi is a collection of weights. The solution to (6) is given by −1 t D W P, B = Dt W D
(7)
with W a diagonal matrix whose ith diagonal element is Wi . In [2] it is recommended the use of the Epanechnikov kernel to determine the weights Wi . Thus, for each i ∈ {1, . . . , m}, (i) let ti = ρ(Zn , Znobs ), and Wi =
cε −1 1 − (ti /ε)2 , ti ≤ ε 0, ti > ε,
being c a normalizing constant and ε the tolerance level.
(8)
466
M. González et al.
By considering a generic metric ρ(·, ·), it is proposed the following algorithm: Likelihood-free sampler: local-linear regression algorithm for i = 1 to m do repeat generate p (i) ∼ π( p) ∗(i) from the likelihood f (Zn∗ | p (i) ) generate Zn (i) (i) ∗(i) calculate ti = ρ(Zn , Znobs ), Zn = S (Zn ) end for pick out the runs with ti ≤ ε. Let I the index set of the selected runs. define the weights Wi according to (8), i ∈ I . solve (6) to obtain B. calculate 1 , . . . , β n )t p (i∗) = p (i) − (Zn(i) − Znobs )(β
(9)
(i)
being the selected runs ( p (i∗) , Zn ), i ∈ I . The outputs p (i∗) are s-dimensional vectors whose coordinates sum one, but, due to regression, some of them can be negative. Such outputs must be removed from the sample. An alternative to warrant that adjusted parameters are probability vectors can be to consider a transformation of the original responses. In in our case, we set out a multinomial logistic regression by using as responses (i) (i) ps−1 p1 (i) q = log (i) , . . . , log (i) ps ps and solve Q = D B, with Q = (q (1) , . . . , q (m) )t , keeping the same notation for D and B (notice now that α and βi , i = 1, . . . , m, are (s − 1)-dimensional parameter row vectors). (i) 1 , . . . , β n )t , Eq. (9) in the algorithm is replaced Finally setting q (i∗) = q (i) − (Zn − Znobs )(β (i∗) (i∗) (i∗) by p = ( p1 , . . . , ps ), with (i∗) exp q j (i∗) , j = 1, . . . , s − 1, and pj = (i∗) 1 − s−1 k=1 exp qk ps(i∗) =
1−
1
. (i∗) exp q k=1 k
s−1
As was suggested in [2], for both rejection and regressions methods we set δ to be a quantile, qδ , of the empirical distribution function of the simulated ρ(Zn , Znobs ), with Zn = S (Zn∗ ). For instance, qδ = 0.15, for the regressions methods, means that the 15 % of the simulated S (Zn∗ ) that provide the smallest values of ρ(Zn , Znobs ) are assigned a nonzero weight. 4 Simulated example To compare the performance of the algorithms we have previously described, we consider a particular case of a control function which keeps the number of parents between two bounds. In an ecological context, a CBP with this kind of control function would be very useful,
Bayesian inference for controlled branching processes
15 5
10
Individuals
20
25
Fig. 1 Evolution of the simulated population sizes, obs {Z 0obs , . . . , Z 40 }
467
0
10
20
30
40
Generations
for example, to model the evolution of animal populations that are invasive species that are widely recognized as a threat to native ecosystems, but there is disagreement about plans to eradicate it. That is, while the presence of the species is appreciated by a part of the society, if their numbers are left uncontrolled it is known to be very harmful to native ecosystems. In such a case it is better to control the population to keep it between admissible limits even though this might mean periods when animals have to be culled. Two examples of recent discussions about this topic are [10] and [33]. Thus, we have considered a CBP with offspring distribution p0 = 0.28398, p1 = 0.42014, p2 = 0.233090, p3 = 0.05747, p4 = 0.00531 and control function φ(x) = 7I{0