integral projection model (IPM) in relation to the amount of data. 3. For large data sets both matrix models and IPMs produced identical estimates of population.
Journal of Applied Ecology 2009, 46, 1048–1053
doi: 10.1111/j.1365-2664.2009.01706.x
METHODOLOGICAL INSIGHTS
Integral projection models perform better for small demographic data sets than matrix population models: a case study of two perennial herbs Satu Ramula1*, Mark Rees2 and Yvonne M. Buckley1,3 1
The University of Queensland, School of Biological Sciences, Queensland 4072, Australia; 2Department of Animal and Plant Sciences, University of Sheffield S10 2TN, UK; and 3CSIRO Sustainable Ecosystems, 306 Carmody Rd, St. Lucia, Queensland 4067, Australia
Summary 1. Matrix population models are widely used to describe population dynamics, conduct population viability analyses and derive management recommendations for plant populations. For endangered or invasive species, management decisions are often based on small demographic data sets. Hence, there is a need for population models which accurately assess population performance from such small data sets. 2. We used demographic data on two perennial herbs with different life histories to compare the accuracy and precision of the traditional matrix population model and the recently developed integral projection model (IPM) in relation to the amount of data. 3. For large data sets both matrix models and IPMs produced identical estimates of population growth rate (k). However, for small data sets containing fewer than 300 individuals, IPMs often produced smaller bias and variance for k than matrix models despite different matrix structures and sampling techniques used to construct the matrix population models. 4. Synthesis and applications. Our results suggest that the smaller bias and variance of k estimates make IPMs preferable to matrix population models for small demographic data sets with a few hundred individuals. These results are likely to be applicable to a wide range of herbaceous, perennial plant species where demographic fate can be modelled as a function of a continuous state variable such as size. We recommend the use of IPMs to assess population performance and management strategies particularly for endangered or invasive perennial herbs where little demographic data are available. Key-words: demography, integral projection model, management, matrix population model, plant population dynamics, population growth rate, population viability analysis
Introduction Matrix population models, where individuals are divided into distinct classes based on their size, age or life history stage, are widely used in plant demographic studies to assess population performance. These models allow the long-term population growth rate (k) and other population parameters to be estimated from individual-level data on survival, growth and fecundity (Caswell 2001). In addition to the traditional use of matrix models for population viability analyses (Menges 2000), matrix models have been used to analyse complex population dynamics and species *Correspondence author. E-mail: satu.ramula@utu.fi
interactions (e.g. Smith, Caswell & Mettler-Cherry 2005; Ramula, Toivonen & Mutikainen 2007), to produce harvest and restoration recommendations (e.g. Freckleton et al. 2003; Linares, Coma & Zabala 2008), and to assess alternative management strategies for invasive plant species (reviewed in Ramula et al. 2008). Despite the popularity of matrix models, they have some limitations, which may decrease the accuracy and precision of estimated population parameters. First, all individuals within the same class are assumed to have identical demographic rates (Caswell 2001). This simplification makes matrix models sensitive to the selected matrix structure and therefore, different matrix structures may produce divergent estimates of k (Ramula & Lehtila¨ 2005). The second limitation of matrix
2009 The Authors. Journal compilation 2009 British Ecological Society
Population models for small data sets 1049 models is the large number of individuals required in each class to estimate demographic rates accurately. Small sample sizes often lead to the pooling of individuals with very different demographic rates within classes. An alternative to matrix models is the integral projection model (IPM), which has many properties similar to matrix models, and can be parameterized from the same data using regression equations (Easterling, Ellner & Dixon 2000). For IPMs, demographic rates are modelled as a continuous function of an individual’s state rather than dividing individuals into distinct classes. IPMs were introduced to plant ecology a decade ago (Easterling et al. 2000) and their use is increasing rapidly (e.g. Rees & Rose 2002; Childs et al. 2004; Rose, Louda & Rees 2005; Williams & Crone 2006; Hesse, Rees & Mu¨llerScha¨rer 2008; Kuss et al. 2008). Using a large (n > 600 individuals) data set Easterling et al. (2000) found that matrix models and IPMs produced identical estimates of k for a perennial herb, and recent applications of IPMs are usually based on data sets with hundreds of individuals (Childs et al. 2004; Ellner & Rees 2006, 2007; Hesse et al. 2008; Kuss et al. 2008). For endangered or invasive plant species, such large data sets are often lacking and population dynamics and management strategies must be assessed based on the available small data sets (Menges 2000; Simberloff 2003; Buckley et al. 2005). Hence, there is a need for models which reliably predict population dynamics from small demographic data sets, making the application of IPMs of great interest. We might generally expect IPMs to be more reliable than matrix models because they require fewer parameters to be estimated and these are estimated from the complete data set, rather than by dividing the data into classes. However, the magnitude of this effect has not been quantified, and we currently have no evidence that IPMs are more suitable for small data sets than matrix population models. Many factors affect the accuracy and precision of k estimates, including distance from the stable stage distribution, the variability of demographic rates and sample size (Caswell 2001). Small sample size may produce inaccurate estimates of k because of large sampling error (Fiske, Bruna & Bolker 2008). One possibility to minimize sampling error for matrix model estimates is to focus the greatest sampling effort on the life stage(s) to which k is most sensitive (Gross 2002). If no a priori knowledge of the importance of different demographic transitions to k is available, the best accuracy for matrix models is achieved by sampling an equal number of individuals for all matrix classes (Mu¨nzbergova´ & Ehrle´n 2005). In addition to model accuracy, the precision of the model is important. A model that produces precise but biased estimates is still useful if the magnitude of the bias is known and can therefore be corrected. We explore the accuracy and precision of matrix population models and IPMs in relation to the amount of demographic data using two perennial herbs with different life histories (Cirsium palustre and Primula veris). We also compare two different techniques to parameterize an IPM, first using a constant regression model structure derived from the full data set and second, allowing the regression model structure to vary
according to the data set at hand, which is sub-sampled from the full data set. We start by constructing a matrix population model and an IPM from the full data sets. We then reduce the number of individuals by sub-sampling the full data sets with and without replacement, and compare the accuracy and precision of k in relation to full data sets. For the matrix models we use two alternative matrix structures and two different sampling techniques, a random sampling from the observed stage distribution and an equal sampling for all matrix classes. We concentrate on k because it is commonly used to quantify population performance.
Materials and methods DEMOGRAPHIC DATA
Cirsium palustre L. (Asteraceae) is a short-lived, monocarpic herb that forms a 50–200 cm high flowering stem in its third summer or later, and reproduction is usually fatal (S. Ramula, personal observation). Primula veris L. (Primulaceae) is a long-lived, iteroparous herb. Both species are rosette-forming, mainly sexually reproducing, and have a persistent seed bank. We used demographic data collected from two Cirsium palustre populations during 2002–2005 in Sweden, and five Primula veris populations during 1996–1998 in Finland. For data description, see Appendix S1 (Supporting Information). Since our aim was to compare the accuracy and precision of matrix models and IPMs in relation to the amount of data, not to predict population dynamics, we pooled the data across the populations and years within the species to increase sample size. Pooling was done for modelling purposes and is rarely recommended for demographic studies (Jongejans & de Kroon 2005). After pooling, we had 1040 individuals for Cirsium and 2155 individuals for Primula. We then sub-sampled these full data sets with samples of 100, 200, 300…1000 individuals. For each sample size, we used 500 replicates, which were randomly drawn from the full data sets without replacement. In addition, we also used an equal sampling, where matrix classes within a species contained equal numbers of individuals, excluding the seed bank. This sampling technique is recommended to minimize sampling error for matrix transitions when the importance of different life stages to k is unknown (Mu¨nzbergova´ & Ehrle´n 2005). Our equal sampling resulted in total sample sizes approximately similar to those gained using sampling from the observed distributions. As a result of small number of individuals for some size classes, we sub-sampled individuals from the full data sets with replacement.
MATRIX POPULATION MODELS
A deterministic matrix population model to predict the population state at time t + 1 is denoted as nt+1 = Ant where A is the matrix and n is the proportion of individuals in each class at time t. The matrix consists of matrix elements (aij), which describe an average contribution of an individual in stage j to stage i over time. To automate the construction of matrices from the sub-samples, we used a slightly different matrix structure from earlier studies for these species (Lehtila¨ et al. 2006; Ramula 2008). For Cirsium, our transition matrix consisted of six classes nearly identical to the original publication (Appendix S1, Supporting Information). For Primula, we pooled small- and medium-sized vegetative plants, resulting in a 5 · 5 matrix (Appendix S1, Supporting Information). For both species, seed bank transitions were estimated separately from the field data by calculating averages across the populations and years.
2009 The Authors. Journal compilation 2009 British Ecological Society, Journal of Applied Ecology, 46, 1048–1053
1050 S. Ramula, M. Rees & Y. M. Buckley In addition to the matrices described above, for both species we used the smallest possible, biologically meaningful matrix dimension consisting of seed bank, seedlings, vegetative plants and flowering plants (Appendix S1, Supporting Information). Further, we reviewed 63 published demographic studies to explore the relationship between the size of demographic data sets and matrix dimensionality using Pearson’s correlation coefficient. For species with multiple matrices available, we used the average sample size.
INTEGRAL PROJECTION MODELS
An IPM that contains a seed bank is described using two equations. The first equation describes the number of seeds in the seed bank at time t + 1, i.e. seeds remaining in the seed bank + fresh seeds entering the seed bank, as
Sðt þ 1Þ ¼ ss ð1 sr ÞSðtÞ þ ð1 se Þ
Z
fs ðxÞnðx; tÞdx
eqn 1
X
where ss is the constant seed survival in the seed bank, sr is recruitment from the seed bank and se is the establishment rate for fresh seeds. The fecundity function is described as fs(x) = fp(x)fn(x), where fp(x) is the probability of flowering and fn(x) is the number of seeds produced by plants of size x. The second equation describes the density of individuals of size (y) at time t + 1 in the established population, including seedlings that germinate from the seed bank (first part), as
nðy; t þ 1Þ ¼ sr fd ðyÞSðtÞ þ ¼ sr fd ðyÞSðtÞ þ
Z
½pðy; xÞ þ fðy; xÞnðx; tÞdx
ZX
eqn 2 kðy; xÞnðx; tÞdx
X
where the kernel k(y,x) describes all possible transitions from plant size x to plant size y, integrated over all sizes (X). Similar to other studies (Rees & Rose 2002; Rose et al. 2005), we used the integration of 0Æ9 times the minimum and 1Æ1 times the maximum rosette size observed, for evaluating the integrals see Table S1 (Supporting Information). The kernel consists of a survival-growth function, p(y,x), and a fecundity function, f(y,x), which both depend on plant size. For the monocarpic Cirsium where flowering is fatal, the survival-growth function is p(y,x) = s(x)[1 ) fp(x)]g(y,x), where s(x) is the probability of survival for a plant size of x, fp(x) is the probability of flowering for a plant size of x and g(y,x) is the probability of a plant of size x growing to size y. For the iteroparous Primula, the survival-growth function is p(y,x) = s(x)g(y,x). For both species, the growth function, g(y,x), is a normal probability density function with mean and variance. The fecundity function is described as f(y,x) = fp(x)fn(x)se fd (y), where fp(x) is the probability of flowering and fn(x) is the number of seeds produced by plants of size x, and fd(y) is the probability distribution of seedling size with constant mean and variance. As a result of a lack of empirical data, we adopted the same procedure as others (Rees & Rose 2002; Childs et al. 2003; Rose et al. 2005; Williams & Crone 2006) and assumed that seedling size was independent of maternal plant size; matrix models make the same assumption. For the kernel parameters and equations, see Table S1 (Supporting Information). To calculate the kernels from the data, we constructed regression models with plant size (rosette diameter for Cirsium and leaf length for Primula) at time t + 1 and seed production at time t as response variables and plant size at time t as an explanatory variable. Plant size and seed production were log-transformed in all the
models. We estimated the dependence of plant survival and flowering probability on plant size using a generalized linear model with a logit link function (Table S1, Supporting Information). For each model, we included a quadratic size term and then selected the best model according to Akaike’s information criterion (Burnham & Anderson 1998). The selection of the best model for each sub-sample (termed the best model) allowed regression equations to vary from linear to quadratic depending on the sub-sample of data at hand. An exception was seed production, for which we always used a linear function to avoid drastic overestimates of seed production for small plants resulting from quadratic functions that sometimes fitted best for small data sets. The parameterization of the kernel from the data at hand is preferable for large data sets but it may not be the best solution for small data sets where additional information from other studies may be useful. In such a situation a priori knowledge of the species could be used to define the forms of the regression models. Therefore, we also used a constant model (termed the constant model), in which the forms of regression models were parameterized from the full data set and were kept fixed (Table S1, Supporting Information), while the parameters were estimated from the sub-samples.
MODEL COMPARISON
To examine the accuracy of the demographic models, we calculated k from each sub-sample and compared the mean ksub-sample with k estimated from the full data set (hereafter kfull-data) for each model. This reveals whether the models on average produce biased estimates of k in relation to kfull-data and if so to which direction. For equal sampling, we used mean kfull-data for the matrix model and IPM calculated from the full data sets with replacement (for estimates see Fig. S1, Supporting Information). The precision of the models was examined from variances for ksub-sample estimates in relation to variances for kfull-data. We conducted all calculations in r 2Æ4Æ1 (R Development Core Team 2006).
Results The matrix models and IPMs constructed from the full data sets produced approximately similar estimates of k (1Æ234 and 1Æ221 for Cirsium; 1Æ331 and 1Æ301 for Primula respectively), as would be expected. Both models produced quite unbiased and precise estimates of k for large data sets with the constant IPM usually being most accurate and most precise (Fig. 1). For all the models, the variance of the k estimates increased with a decreasing amount of data and most rapidly so for the matrix model (Fig. 1c,d). For the smallest data set of 100 individuals, the matrix model resulted in 2Æ4 times greater variance than the constant IPM for Cirsium and 1Æ6 times greater variance for Primula (Fig. 1c,d). For small data sets containing fewer than 300 individuals, IPMs thus produced smaller bias and variance for k than matrix models which generally underestimated k (Fig. 1). Equal sampling of individuals for the matrix classes did not qualitatively affect the results, and IPMs still tended to produce smaller bias and variance for k than matrix models for small data sets (Fig. S1, Supporting Information). However, equal sampling somewhat reduced bias and variance in k for the matrix models of Primula but not for Cirsium (Fig. 1 and S1 in Supporting Information).
2009 The Authors. Journal compilation 2009 British Ecological Society, Journal of Applied Ecology, 46, 1048–1053
Population models for small data sets 1051
(a)
(b)
(c)
(d)
Fig. 1. Bias and variance of population growth rate (k) produced by matrix and integral projection models (IPM) in relation to sample size drawn from observed size distributions. (a–b) Average bias in k calculated as (mean ksub-sample–kfull-data) ⁄ kfull-data · 100 for each model, with zero bias indicated by the dashed line. (c–d) Variance of k. In the best IPM the structure of regression models was allowed to vary depending on the subsample of data used, while in the constant IPM the regression model structure derived from the full data set remained constant. For each sample size, parameters are calculated from 500 sub-samples that are drawn without replacement from the full data sets for two perennial herbs. A 6 · 6 matrix is used for Cirsium and a 5 · 5 matrix for Primula.
For both study species, the smallest possible, biologically meaningful matrices overestimated k (Fig. 2). Interestingly, a review of published demographic studies revealed that matrix dimensionality increased with an increasing number of individuals in data sets (Fig. 3 and Appendix S2 in Supporting Infor-
mation). Demographic data sets contained fewer than 300 individuals for 52% of the 63 examined plant species (minimum = 62, median = 263, maximum = 5765 individuals), and such small data sets occurred for both common and rare species (55% and 45% respectively).
(a)
(b)
(c)
(d)
Fig. 2. The effect of matrix dimension on population growth rates (k) and their variance in relation to sample size for two perennial herbs. A 4 · 4 matrix denotes the smallest possible, biologically meaningful matrix dimension. The dashed line indicates k estimated from the full data set using the larger matrix. 2009 The Authors. Journal compilation 2009 British Ecological Society, Journal of Applied Ecology, 46, 1048–1053
1052 S. Ramula, M. Rees & Y. M. Buckley
Fig. 3. Relationship between total sample size and matrix dimension estimated from 63 plant demographic studies. Pearson’s correlation coefficient r = 0Æ535, P < 0Æ0001; and without the observation with the greatest dimension r = 0Æ391, P = 0Æ0017, n = 62.
Discussion Using demographic data from two perennial herbs with different life histories, we found that IPMs were more suitable for estimating k from small data sets (