Stochastic Simulation of Forest Regeneration Establishment Using a Multilevel Multivariate Model Jari Miina and Jaakko Heinonen Abstract: Here we present a method for simulating high unexplained variation in establishment of tree seedlings in planted Norway spruce (Picea abies [L.] Karst.) stands. The simulation method is based on an existing hierarchical multilevel multivariate model for establishment of regeneration (Miina, J., and T. Saksa. 2006. New For. 32:265–283). The model includes seven simultaneously estimated models for numbers and heights of tree seedlings on 20-m2 plots within a regeneration area. The variation in plot-level expectations is described by fixed effects and two nested multivariate normal random effects. Conditionally, given the expected values, one of the count variables is underdispersed and four are overdispersed relative to Poisson law. The conditional distributions of the square roots of the height variables are normal. For the conditional joint distribution, the model defines the product-moment cross-correlation matrix, which includes both positive and negative cross-correlations. The first two moments do not, however, uniquely define the marginal distributions of the count variables or the joint distribution of all seven variables. We applied binomial distribution for the underdispersed Poisson variation and negative binomial distribution for the overdispersed Poisson variation. The conditional joint distribution was defined by a multivariate normal copula and the marginal distributions. Despite relatively low cross-correlations among responses, if the cross-correlations among random components of the responses were ignored in stochastic simulations, the amount and costs of precommercial thinning (i.e., cleaning out competing broadleaves) were greatly underestimated. If all random effect components in model parameters were dropped, the error increased markedly. FOR. SCI. 54(2):206 –219. Keywords: generalized linear mixed models, normal copula, overdispersion
I
N FOREST MODELING, prediction of regeneration dynamics is very difficult and thus is often neglected, partly due to unmeasured temporal and spatial processes that affect the birth, growth, damage and mortality of seedlings (e.g., Kozlowski 2002). Even if site preparation and planting, for example, are used to promote establishment and development of regeneration, there is still high variation in the result of forest regeneration. Because of the features of forest regeneration, the stochastic variation associated with the parameter estimates of the statistical regeneration models is typically very high; however, predictions derived from these models are unbiased (e.g., Ferguson et al. 1986, Vanclay 1992, Golser and Hasenauer 1997, Schweiger and Sterba 1997, Miina and Saksa 2006). To imitate the total variance in response, residual variance should be taken into account in predictions (cf. Miina 1993, Stage and Wykoff 1993). Therefore, forest regeneration models must be inherently stochastic. In the case of a multivariate model, cross-correlations among responses need to be identified and taken into account in stochastic simulations. For these reasons, the applicability of linear regression for predicting forest regeneration has been found to be limited (e.g., Robinson and Kurtz 1998). Nonparametric methods such as imputation models, which do not require distributional assumptions and are multivariate, i.e., provide estimates of many responses (species, number of trees, height class, etc.) at one time, could be applied if no representative data for statistical regeneration modeling are available (Ek et al. 1997, Hassani et al. 2004).
The multilevel multivariate modeling approach applied to regeneration inventory data by Miina and Saksa (2006) takes into account the variation in establishment of forest regeneration at different hierarchical levels (municipality, regeneration area, and sample plot). In addition, it allows simultaneous prediction of several responses measured from the same plots, which has not previously been considered in regeneration modeling (Miina et al. 2006). In multilevel multivariate modeling, the unexplained variation is captured by means of random effects (e.g., Searle et al. 1992). A random component may be related to the intercept or to specific predictors (see Fang and Bailey 2001 for detailed discussions). With the parameters of the random effects, it is possible to simulate the high unexplained structured variation in establishment of regeneration. The variability of the model can be analyzed by incorporating stochastic structure in model predictions. This means that the model is run by adding, for example, the residual and annual variation in tree growth to the growth predictions, and the variability is estimated using the model outputs (Kangas 1998). This so-called stochastic simulation (e.g., Ripley 1987) allows the structured spatial, temporal or nested stochastic components, and unstructured stochastic components to be preserved in predictions. In addition to unexplained variation in the model, there are also other sources of prediction error, e.g., model misspecification, random estimation errors in model parameters, and errors in independent variables (Kangas 1999), but these are beyond the scope of this study.
Jari Miina, Finnish Forest Research Institute, Joensuu Research Unit, Joensuu, FI-80101, Finland—Phone: 358102113106;
[email protected]. Jaakko Heinonen, Finnish Forest Research Institute, Joensuu Research Unit, Joensuu, FI-80101, Finland—
[email protected]. Manuscript received March 13, 2007, accepted September 25, 2007
206
Forest Science 54(2) 2008
Copyright © 2008 by the Society of American Foresters
Stochastic individual-tree models used in forest management have been presented and their benefits over deterministic ones have been discussed by Fox et al. (2001). Stochastic simulation systems have been used to optimize silvicultural prescriptions by, e.g., Valsta (1992). He has demonstrated that deterministic simulations can lead to biased conclusions and nonoptimal forest management. Pukkala and Kolstro¨m (1992) have derived a stochastic regeneration model from relations that describe forest regeneration, including amount and quality of the seed crop, seed dispersal, germination and predation of seeds, and survival and growth of seedlings. The model allows us to evaluate the risk of failure in a natural regeneration. Zhou (1999) has selected the optimal regeneration method by using a stochastic simulation and optimization system and considering the stochastic variation in the number of regenerated seedlings. Although stochastic models are more realistic than deterministic ones, studies on the application of stochastic models in decisions involving regeneration treatments are rare. The aim of this study was to develop a system for generating stochastic predictions using the existing model of regeneration establishment for planted Norway spruce (Picea abies [L.] Karst.) stands. This model predicts the numbers of five classes of tree species and the heights of crop-tree spruces and deciduous trees on 20-m2 sample plots within regeneration areas (Miina and Saksa 2006). The next section introduces the multilevel multivariate model applied. The method is then described so that it includes structured stochastic components in the model predictions. Finally, regeneration results predicted using deterministic and stochastic simulations are evaluated at both regeneration area and sample plot levels. It is hypothesized that ignoring the cross-correlations among stochastic components in predictions will affect the silvicultural decisions made on the basis of the simulated regeneration result.
the maximum number of seedlings was set to 20 seedlings per plot and per tree species class). The height of a crop tree, in most cases a planted spruce, closest to the center of the plot and the dominant height of deciduous tree species (the height of the tallest deciduous tree) were also measured. Because of the hierarchical structure of the data and the simultaneous determination of the extent and composition of regeneration, multilevel multivariate modeling was applied (Snijders and Bosker 1999, Goldstein 2003). First, the data had a multilevel hierarchy, i.e., sample plots within regeneration areas within municipalities. Consequently, plot-level outcomes of an individual response variable (i.e., numbers of trees and height measurements) were auto-correlated, which was taken into account by random effects at different levels in the variance component models where the intercept varied randomly across levels (Searle et al. 1992). Second, the data had a multivariate structure because the responses were cross-correlated, and thus the random effects and the random errors of the models were cross-correlated. The temperature sum was the only continuous predictor in the model, whereas other predictors (site fertility, soil characteristics, and methods of site preparation) were included using dummy variables. The model was fitted using MLwiN software and applying the combination of the iterative generalized least-squares (IGLS) and the first-order marginal quasi-likelihood (MQL) procedures (Rasbash et al. 2005).
Extra-Poisson Multilevel Models for the Number of Trees The general form of the Poisson multilevel model for number of trees in tree species class l (l ⫽ 1, 2, . . . , 5) was nlmsp ⬃ Poisson(lmsp)
lmsp ⫽ exp{ f(Xlmsp, l) ⫹ ulm ⫹ ulms},
Model for Establishment of Regeneration The model used in this study predicts the establishment of regeneration on planted 3-year-old Norway spruce (Picea abies [L.] Karst.) stands in southern Finland (Miina and Saksa 2006). Establishment of tree seedlings was described by seven response variables: number of planted spruces, natural Scots pines (Pinus sylvestris L.), natural spruces, natural seed-origin birches (Betula pubescens Ehrh. and Betula pendula Roth.) and other broadleaves (i.e., sproutorigin birches and broadleaves other than birch), as well as height of crop-tree spruce and dominant height of broadleaves. The model was estimated using regeneration data for planted Norway spruce stands surveyed in 40 municipalities in 2000 –2002. About 15 temporary sample plots (20 m2) were systematically distributed over the regeneration areas (N ⫽ 1,541). Site characteristics, method of site preparation, target tree species, and regeneration method were determined for each plot. On a given plot (N ⫽ 22,007), established seedlings were assessed by counting the number of seedlings in five tree species classes. For each tree species class, the number of seedlings was right-censored to 20 seedlings per plot (i.e.,
(1)
where nlmsp is the number of trees on plot msp, the conditional distribution of nlmsp, given the expected value lmsp, is Poisson distribution, f(䡠) is the fixed part of the model, Xlmsp is a vector of fixed predictors, and l is a vector of fixed parameters. Subscripts p, s, and m refer to plot p, regeneration area (later referred to as stand) s, and municipality m, respectively. ulm and ulms are random, normally distributed between-municipality and between-stand effects 2 2 with a mean of 0 and constant variances lm and lms , respectively. Note that the forestry center-level random effect is omitted here because it was not significant in the original model. The plot-level random errors are defined as the deviations between random realizations and conditional plot means. The random effects and errors of the models are assumed to be cross-correlated at the same level and uncorrelated across different levels. For Poisson distribution, the conditional variance var(nlmsp) ⫽ lmsp, given the random effects. Extra-Poisson variation was obtained by assuming that the conditional variance is only proportional to the mean: var(nlmsp) ⫽ l ⫻ lmsp, where the dispersion parameters l were estimated from the data. Because the estimates of the dispersion parameters were not equal to 1, Forest Science 54(2) 2008
207
the model defines only the two first moments of the conditional distribution of nlmsp.
Normal Multilevel Models for Tree Heights The height of the crop-tree spruce and the dominant height of broadleaves were square root-transformed and modeled by fitting normal multilevel models. The general form of the normal multilevel model for height variable l (l ⫽ 6, 7) was
lmsp ⫽ 冑h lmsp ⫽ f 共X lmsp , l兲 ⫹ ulm ⫹ ulms ⫹ elmsp ,
(2)
where index 6 is for the height of the crop-tree spruce (cm) and index 7 is for the dominant height of broadleaves (cm) and elmsp is the normally distributed error term at the plot 2 level with a mean of 0 and constant variance lmsp .
Variance and Cross-Correlation among Random Components The fixed parameters and variance components of the models were parameterized directly in the multilevel multivariate model estimated by MLwiN. The parameter estimates of the fixed parts of the regeneration model are presented by Miina and Saksa (2006). On the other hand, the variance-covariance matrixes of the random effects and the random errors are also presented here (Table 1) because they are of the greatest importance when the model is applied in stochastic simulations. The stand-level residual 2 variation lms of the models for the number of trees was 2–7 2 times higher than the municipality-level variation lm . The dispersion parameters l were interpreted to indicate how the spatial distribution of trees differed from the Poisson distribution (cf. Griffith and Haining 2006). The estimated dispersion parameter for the number of planted spruces was 0.6, which indicates that planted spruces were underdispersed (i.e., more evenly distributed than expected in the Poisson process). Dispersion parameters for natural seedlings varied between 3.4 and 4.4, which indicates that natural seedlings were overdispersed (i.e., clustered in the stand). Extra-Poisson variation could also have been caused, for example, by unmeasured heterogeneity in site quality. In
the normal models for crop-tree height and dominant height of broadleaves, most of the unexplained variation (64 and 47%, respectively) was found at the plot level. The cross-correlations among the random effects at the municipality level were not significant (P ⬎ 0.05), and they were therefore set at zero in the model. In absolute terms, the stand-level cross-correlations were slightly higher than the corresponding (partial) cross-correlations at plot level (Table 1). All statistically significant cross-correlations were logical and shared mostly the same sign (⫹/⫺) at the stand and plot levels. For example, the random effects of the model for number of planted spruces correlated significantly and positively with those for height of the crop-tree spruce. This means that if there were more planted spruces than predicted by the model, the measured height of the crop trees was also greater than the predicted height (i.e., seedlings on successfully regenerated plots were larger than those on poorly regenerated plots).
Stochastic Simulation of Regeneration Establishment Stochastic Variation at Municipality and Stand Levels The establishment of regeneration was stochastically simulated by predicting the numbers of trees and tree heights using the fixed part of each model and adding realizations of the random effects to the linear predictions (Figure 1). The simulation was programmed and the random numbers were generated in the R-environment (R Development Core Team 2005). For seven models (l ⫽ 1, 2, . . . , 7), the municipality-level random effects ulm ⫽ (u1m, u2m, . . . , u7m) were sampled from an uncorrelated multivariate normal (MVN) distribution and the stand-level random effects ulms ⫽ (u1ms, u2ms, . . . , u7ms) from a correlated MVN distribution. Each sum ulm ⫹ ulms was truncated at ⫾2.5 ⫻ 2 2 公 lm ⫹ lms to reject very small and large values associated with a normal distribution. Note that, in this article, the same symbol refers both to a parameter of the model and to its estimate. In the following text, however, the estimates computed by MLwiN are indicated by a “hat” (e.g., ˆ ).
Table 1. Variance components of and cross-correlations among random effects and errors of the models for number of planted spruces (n1), natural pines (n2), natural spruces (n3), seed-origin birches (n4) and other broadleaves (n5), as well as height of crop-tree spruce (h6) and dominant height of broadleaves (h7)
Model for* n1
n2
n3
Variance components at 2 Municipality-level (lm 0.0157 0.3720 0.2011 ) 2 Stand-level (lms) 0.0695 2.4695 1.5020 2 plot-level (l or lmsp 0.6045 3.9606 4.3580 ) Cross-correlations at stand (upper triangle) and plot level (lower triangle) n1 1 0.058 ⫺0.034 n2 0.118 1 0.338 n3 0.049 0.184 1 n4 0.067 0.132 0.108 n5 0.028 0.009 0.060 h6 0.139 ⫺0.007 0.001 h7 ⫺0.031 ⫺0.053 0.008 * Statistically insignificant (P ⬎ 0.05) estimates are in italics.
208
Forest Science 54(2) 2008
n4
n5
h6
h7
0.2931 0.5913 4.1260
0.0618 0.1872 3.4332
0.1617 0.4467 1.0907
0.8091 2.4222 2.9217
⫺0.050 0.322 0.212 1 ⫺0.026 ⫺0.014 ⫺0.058
⫺0.019 0.016 0.075 ⫺0.053 1 0.006 0.285
0.190 0.095 0.071 ⫺0.005 0.066 1 0.077
⫺0.009 ⫺0.066 0.005 ⫺0.058 0.377 0.325 1
Stochastic predictions?
No ĺ
Calculate the deterministic predictions using the given fixed effects Xlmsp: nlmsp = exp{ f ( X lmsp , E l )} , l = 1-5
hlmsp = f ( X lmsp , E l ) , l = 6,7 2
YesĻ Calculate the plot-level means: - Calculate f ( X lmsp , E l ) using the given fixed effects Xlmsp. - Simulate the mutually independent municipality-level random effects (ulm). - Simulate the cross-correlated stand-level random effects (ulms). In uncorrelated simulations, set cross-correlations to zero. -
2 Truncate ulm + ulms at ± 2.5 × V lm2 + V lms . Determine coefficient cl to take into account the effect of truncation of random effects. Calculate: 2 µ lmsp = exp{f ( X lmsp , E l ) + u lm + u lms } cl exp (V lm2 + V lms ) / 2 , l = 1-5
{
}
µ lmsp = f ( X lmsp , E l ) + u lm + u lms , l = 6,7. Ļ
Calculate the plot-level predictions: N - Determine the cross-correlation matrix ¦ msp for the MVN distribution as a function of the target marginal distributions. In uncorrelated simulations, set cross-correlations to zero. N - Sample x = ( x1 , x 2 ,..., x7 ) ~ MVN ( µ msp , ¦ msp ). - Transform xl through the copula to binomial (l = 1) and negative binomial (l = 2-5) variates (i.e., the counts nlmsp) using the cumulative probability of xl and the inverse of the marginal distribution function of the variable l. - Right-censor the counts nlmsp to 20. - For tree heights (l = 6,7), set the negative xl to zero and calculate hlmsp = xl2 .
Figure 1. Flow chart of the procedure used to simulate stochastic, cross-correlated predictions.
For the count variables (l ⫽ 1, 2, . . . , 5), the conditional distribution of the plot-level expected value lmsp兩Xlmsp, given the fixed factors Xlmsp, is a lognormal distribution. The parameter estimates ˆ computed using MQL are such that function exp{f(Xlmsp, ˆ l)} is a consistent estimate for the expectation E[lmsp兩Xlmsp] ⫽ E[exp{f(Xlmsp, l) ⫹ ulm ⫹ 2 2 ulms}] ⫽ exp{f(Xlmsp, l) ⫹ (lm ⫹ lms )/2} (Breslow and Clayton 1993, Goldstein 2003). Hence, the fixed part exp{f(Xlmsp, l)} can be estimated by exp{f(Xlmsp, ˆ l)}/ 2 2 exp{(ˆ lm ⫹ ˆ lms )/2}. In simulations, random effects were truncated and the 2 2 total variance is less than lm ⫹ lms . The effect of truncation was taken into account by a correction coefficient cl (ⱕ1), and the fixed part of count models was computed as 2 exp{f(Xlmsp, l)} ⫽ exp{f(Xlmsp, ˆ l)}/cl exp{(ˆ lm ⫹ 2 ˆ lms)/2}. The value of the coefficient cl was determined by simulations. A sample xln ⫽ (xl1, xl2, . . . , xln), n ⫽ 10,000, was drawn from the normal distribution with a mean of 0 2 2 and variance 2l ⫽ ˆ lm ⫹ ˆ lms , and the values were trun2 公 cated at ⫾2.5 ⫻ l . Let x⬘ln be the truncated value of xln and let y l be the mean of exp{x⬘ln}, then cl ⫽ y l/exp{2l /2}. The coefficients c1–c5, for five classes of tree species were 1.00, 0.89, 0.95, 0.98, and 1.00, respectively. In our application, the correction coefficients had only a minor effect on the simulation results. Finally, the plot-level expectations of the count variables (l ⫽ 1, 2, . . . , 5) lmsp兩Xlmsp, ulm, ulms ⫽ exp{f(Xlmsp, ˆ l) ⫹ 2 2 ulm ⫹ ulms}/cl exp{(ˆ lm ⫹ ˆ lms )/2} were right-censored to 20 seedlings per plot and per tree species class, as was done in the modeling data. No transformation correction was needed for the predictions of the plot-level expectations of the height variables (l ⫽ 6, 7) and lmsp兩Xlmsp, ulm, ulms ⫽ f(Xlmsp, ˆ l) ⫹ ulm ⫹ ulms.
Stochastic Variation at Plot Level Because of the free dispersion parameters l, the regeneration model defines the mean ⫽ lmsp and the variance ⫽ l ⫻ lmsp (l ⫽ 1, 2, . . . , 5) for the conditional (plot-level) extra-Poisson variation of the five count variables, but it does not specify the whole distributions. In this study, the underdispersed count variable was assumed to have a binomial distribution, whereas a negative binomial distribution (gamma mixture of Poisson distributions) was applied for the overdispersed count variables. The conditional joint distribution of the square root of the height variables is a bivariate normal distribution, and all seven variables are cross-correlated with both positive and negative Pearson’s correlation coefficients (Table 1). The regeneration model gives the pairwise cross-correlations of the marginal distributions, but it does not specify the joined distribution. In this study, the joined distribution was defined by a MVN copula. According to Sklar’s theorem (Nelsen 2006), an n-dimensional distribution function H(x1, x2, . . . , xn) with margins F1(x1), F2(x2), . . . , Fn(xn) can be expressed in the form C(F1(x1), F2(x2), . . . , Fn(xn)), where C is a function called a copula (n-copula). Furthermore, if C is an n-copula and F1(x1), F2(x2), . . . , Fn(xn) are distribution functions, then C(F1(x1), F2(x2), . . . , Fn(xn)) is an ndimensional distribution function with margins F1(x1), F2(x2), . . . , Fn(xn). An n-copula is a function from the n-cube [0, 1]n to interval [0, 1] that fulfils certain regularity conditions, and if the margins are continuous, it is unique in [0, 1]n. In the case of random variates, it is an n-dimensional distribution function whose one-dimensional margins are uniform on the interval [0, 1]. A copula can be used to define the joint distribution for an arbitrary set of marginal distributions. The cumulative probability function of a continuous multivariate distribution is a copula, and we defined the joined distribution of the margins by the cumulative distribution function of a seven-dimensional MVN distribution. In the simulation, we sampled from a seven-dimensional MVN distribution, computed the cumulative probabilities of five coordinates that correspond to the count variables, and used the inverse distribution function of binomial or negative binomial distributions to transform the probabilities to the quantiles (i.e., counts). As a result, a sample from a MVN distribution was transformed to a sample from the joint distribution of one binomial, four negative binomial, and two normal variates (target distribution). Different copula functions define different joint distributions for the given set of marginal distributions. The MVN copula was suitable for our purposes in the sense that it can be defined uniquely by the pairwise cross-correlations between the margins. However, the transformation from a normal distribution to a binomial or negative binomial distribution is nonlinear, and the continuous variables are coarsened to integers. For these reasons, the pairwise Pearson’s correlation coefficients of the MVN distribution differ from the corresponding cross-correlations of the target distribution. Differences between the cross-correlations depend on the values of the parameters of the target marginal distributions, and, as far as we know, there is no closed form Forest Science 54(2) 2008
209
expression for the differences. We used numerical approximations and interpolations to determine the pairwise crosscorrelations for the MVN distribution. The effect of rightcensoring on the cross-correlations was not taken into account (see also interpretation of the mean parameter lmsp later in this section). The details of this method are described in the Appendix. In simulations we scaled the mean values and variances of the MVN distribution according to the target distribution. A copula does not depend on scaling of the margins, and the standardized MVN distribution (with marginal distributions of mean ⫽ 0 and variance ⫽ 1) generates the same target distribution. After the varianceN covariance matrix 冱msp was assessed for the MVN distribution (see Appendix), a sample x ⫽ (x1, x2, x3, . . . , x7) was N drawn from MVN(msp, 冱msp ) using the R function mvrnorm. Predicted values of the count variables nlmsp (l ⫽ 1, 2, . . . , 5) were computed using the equation nlmsp ⫽ ⫺1 2 2 Flmsp (⌽(xl; lmsp, lmsp )), where lmsp is the variance of 2 coordinate l, ⌽(xl; lmsp, lmsp) ⫽ pl is the cumulative 2 probability of xl for normal distribution N(lmsp, lmsp ), and ⫺1 Flmsp is the inverse of the marginal distribution function for the count variable l on plot msp. The probabilities pl were computed using the R-function pnorm. The first coordinate x1 of a sample x corresponded to the underdispersed count variable (planted spruce), and its distribution was assumed to be binomial. The value of its ⫺1 inverse probability function Flmsp (p1) was computed using the R-function qbinom(p1, sizebin, pbin), where sizebin is the number of trials and pbin is the probability of success for one trial. For binomial distribution, the mean ⫽ sizebin ⫻ pbin and the variance 2 ⫽ sizebin ⫻ pbin ⫻ (1 – pbin). If ⫽ 1msp and 2 ⫽ 1 ⫻ 1msp, then pbin ⫽ 1 – 1 and sizebin ⫽ 1msp/pbin. The number of trials must be a positive integer, which was obtained by the function sizebin ⫽ max(1, trunc(1msp/pbin ⫹ 0.5)). The marginal distributions of the overdispersed count variables n2–n5 were assumed to be negative binomial dis⫺1 tributions, and the values Flmsp (pl) (l ⫽ 2, 3, 4, 5) were computed using the R-function qnbinom(pl, sizenbin, , ), where is the mean value and sizenbin is called the shape parameter of the gamma-Poisson mixture distribution. If the mean ⫽ lmsp and variance 2 ⫽ l ⫻ lmsp, then sizenbin ⫽ lmsp/(l ⫺ 1). The coordinates x6 and x7 correspond to the square root of two height variables whose predicted values were h6msp ⫽ max(0, x6)2 and h7msp ⫽ max(0, x7)2, respectively. All the counts were right-censored to 20, i.e., nlmsp ⫽ min(20, nlmsp), as was the case in the modeling data because the stochastically simulated data were subsequently used to refit the model. The method of estimation for the regeneration establishment model did not take into account the effect of right-censoring, and the mean parameter lmsp of count variable l refers to the mean after right-censoring. In simulations, we treated this as the mean before right-censoring. Therefore, the simulated overall averages were lower than the corresponding data values. This bias can be b avoided by replacing the parameter lmsp by lmsp , the mean b before right-censoring. However, the value of lmsp has to 210
Forest Science 54(2) 2008
be solved by iteration for each variable on each plot, and in our simulations we did not compute these values.
Simulations The establishment of regeneration for the modeling data was simulated both deterministically and stochastically. In deterministic simulations, only the fixed part of each model was used, and thus the predicted numbers of trees were not integer values. In stochastic simulations, random components at different levels were included in the predictions. The municipality-level random effects of the model were always uncorrelated, but both cross-correlated and uncorrelated stand-level random effects and plot-level random errors were generated to study how ignoring the cross-correlations among the random components affected the simulated regeneration result. Considering computing resources, stochastic simulations were repeated 10 times using different seed numbers for random number generators in R. To make the comparison of the simulation results as precise as possible, the same seed numbers were used in both cross-correlated and uncorrelated simulations. The simulated and observed numbers of different tree species classes as well as their sums were compared at both stand and plot levels. Note that to study the effect of cross-correlation among the stochastic variables on the sum of generated variables, simulations are not needed, but the mean and variance of the sum can be calculated simply from the mean vector and the variance-covariance matrix of the stochastic variables. On the contrary, simulations are needed if, for example, the distributions of stochastic variables or their sum is of interest. In this study, the distribution of the regeneration result and the number of stands needing precommercial thinning (i.e., cleaning out competing broadleaves) were calculated and analyzed. In addition, simulations were used to develop and evaluate the simulation system to be connected to a stand simulator where it would facilitate predictions of stand structure after regeneration cutting and spruce planting.
Results Evaluation of the Simulation Method The simulation method was evaluated by refitting the regeneration model using the simulated data sets and comparing the published and refitted models, especially the variance-covariance matrixes of the stand-level random effects and plot-level random errors. The cross-correlations among the plot-level random errors and stand-level random effects of the models refitted to the data simulated by adding cross-correlated random components into the predictions were reasonably close to the published ones (Figure 2). Because of right-censoring of predictions (to 20 trees per plot and per tree species class) and truncation of random effects, however, cross-correlations among natural pines (n2), spruces (n3), and birches (n4) were underestimated at both plot and stand levels. As expected, the cross-correlations were close to zero when the data sets from the uncorrelated simulations were used in refitting the model. For the simulated data, the variance components of the plot-level
0.4
5.0 Measured data
Measured data Correlated
4.0
Uncorrelated 0.2
Variance
Cross-correlation
0.3
Correlated
0.1 0.0 -0.1
n2
n3
n4
n5
2.0 1.0
n2 n3 n4 n5 h6 h7 n3 n4 n5 h6 h7 n4 n5 h6 h7 n5 h6 h7 h6 h7 h7 n1
3.0
h6
0.0
Plot-level random errors
n1
n2
n3
n4
n5
h6
h7
Plot-level random error 0.4
Cross-correlation
Measured data 0.3
Correlated Uncorrelated
5.0
0.2
Measured data 0.1
Correlated
4.0
-0.1 n2 n3 n4 n5 h6 h7 n3 n4 n5 h6 h7 n4 n5 h6 h7 n5 h6 h7 h6 h7 h7 n1
n2
n3
n4
n5
Variance
0.0
3.0 2.0
h6
Stand-level random effects
1.0
Figure 2. Cross-correlations among the plot-level random errors (top) and stand-level random effects (bottom) of the models fitted using the measured and simulated data on number of planted spruces (n1), natural pines (n2), natural spruces (n3), seed-origin birches (n4), and other broadleaves (n5), as well as height of the crop-tree spruce (h6) and dominant height of broadleaves (h7). For simulated data, the mean (ⴞSD) cross-correlations of 10 simulations using cross-correlated and uncorrelated random components are presented.
random errors of all overdispersed count responses (n2–n5) were slightly underestimated, whereas those of the standlevel random effects of natural pines (n2) and spruces (n3) were clearly overestimated (Figure 3). All fixed parameters of the models fitted using the data sets from stochastic simulations were close to the published ones (refitted models not presented). The cross-correlations among the simulated responses were compared with those among the measured responses. The cross-correlations among the responses (especially tree heights), which were simulated deterministically, were high and in absolute terms exceeded the observed cross-correlations at both plot and stand levels (Figure 4). The high cross-correlations among the deterministic predictions were due to the fact that virtually the same variables were used as fixed predictors in the models (Miina and Saksa 2006). Adding the uncorrelated random components into the predictions resulted in almost zero-cross-correlations among the predictions. On the contrary, both plot- and stand-level cross-correlations observed in the modeling data were obtained reasonably well by adding the cross-correlated random components into the predictions. Only the cross-correlations among the numbers of natural seedlings (n2, n3, and n4) were slightly underestimated. When the stochastic simulations were repeated 10 times for the modeling data (40 municipalities, 1,541 stands, and 22,007 plots), in the means and standard deviations, there were only slight differences between replicates. Only the regeneration results from the first stochastic simulations are
0.0 n1
n2
n3
n4
n5
h6
h7
Stand-level random effect
Figure 3. Variances of the plot-level random errors (top) and standlevel random effects (bottom) of the models fitted using the measured and simulated (cross-correlated random components) data on number of planted spruces (n1), natural pines (n2), natural spruces (n3), seedorigin birches (n4), and other broadleaves (n5), as well as height of the crop-tree spruce (h6) and dominant height of broadleaves (h7). For measured and simulated data, error bars indicate ⴞSE of the variance component and ⴞSD of the variance component of 10 simulations, respectively.
presented and compared with the measured ones, as well as to the result from the deterministic simulation in Table 2. According to the mean values of the simulated responses, the stochastic predictions were underestimates, especially for the number of natural pines (n2). This result was due to right-censoring of the counts. The effect of truncation of random effects on the plot-level means was corrected by the coefficients cl. The deterministic predictions were also slightly biased, especially for tree heights (cf. Miina and Saksa 2006). As expected, the total variation in deterministic predictions was clearly lower than that in stochastic predictions. Thus, only stochastic simulations were able to mimic the total variation in responses. The generated marginal distributions are illustrated in Figures 5 and 6, where the cumulative percentages of the numbers of trees simulated using cross-correlated random components were compared with the observed percentages. The cumulative percentages of the observed and simulated number of planted spruces fitted each other well. Some differences were found in the frequencies of plots and stands that had a small number of natural pines (n2) and spruces (n3) and a large number of deciduous trees (n4 and n5). In the modeling data, about 70% of the plots had no Forest Science 54(2) 2008
211
was underestimated, especially for seed-origin birch (n4) and other broadleaves (n5). The marginal distributions generated in uncorrelated stochastic simulations did not differ from those obtained in cross-correlated stochastic simulations (results not presented). On the contrary, the cumulative percentages calculated using deterministic predictions again revealed the fact that the fixed part of the model was not able to mimic the total variation in establishment of regeneration.
Cross-correlation
1.0 0.8
Observed Correlated
0.6
Uncorrelated Deterministic
0.4 0.2 0.0 -0.2 -0.4 n2 n3 n4 n5 h6 h7 n3 n4 n5 h6 h7 n4 n5 h6 h7 n5 h6 h7 h6 h7 h7 n1
n2
n3
n4
n5
h6
Results of Simulations
Plot-level responses
1.0
Cross-correlation
Observed 0.8
Correlated Uncorrelated
0.6
Deterministic
0.4 0.2 0.0 -0.2 -0.4 n2 n3 n4 n5 h6 h7 n3 n4 n5 h6 h7 n4 n5 h6 h7 n5 h6 h7 h6 h7 h7 n1
n2
n3
n4
n5
h6
Stand-level responses
Figure 4. Plot-level (top) and stand-level (bottom) cross-correlations among measured (observed) and simulated responses: the number of planted spruces (n1), natural pines (n2), natural spruces (n3), seedorigin birches (n4), and other broadleaves (n5), as well as the height of the crop-tree spruce (h6) and dominant height of broadleaves (h7). The mean (ⴞSD) values of 10 simulations are presented for stochastic simulations using cross-correlated and uncorrelated random components.
natural pines or spruces; thus, in the simulations the proportion of zero-plots was underestimated. In addition, the percentage of plots with at least 20 trees (threshold value)
Simulations were conducted to test the hypothesis that ignoring the cross-correlations among the random components will affect silvicultural decisions made on the basis of the simulated regeneration result. In planted stands, if the mortality of planted seedlings is high, it is important to have naturally regenerated seedlings that supplement regenerations. Therefore, the effect of simulating the unexplained variation was evaluated by calculating three different sums for the number of trees per plot: Sum1 ⫽ planted spruces and natural pines and spruces (n1 ⫹ n2 ⫹ n3); Sum2 ⫽ planted spruces and seed-origin birches (n1 ⫹ n4); and Sum3 ⫽ planted spruces, natural pines and spruces, and seed-origin birches (n1 ⫹ n2 ⫹ n3 ⫹ n4). The regeneration result was classified as poor if there were ⱕ2 trees per plot or, on average, ⬍2.5 trees per plot within a stand (i.e., ⬍1,250 trees/ha). The proportion of poorly regenerated plots and stands was calculated in the modeling and simulated data sets. In stochastically simulated data sets, when only planted spruces (n1) were counted, the proportion of poorly regenerated plots and stands was overestimated by almost 10%-units (Table 3). If natural pines and spruces were also counted together with planted spruces (Sum1), the proportion of poorly regenerated plots and stands was overestimated by 4 – 6%-units. Counting seed-origin birches together with planted spruces (Sum2) resulted in an unbiased proportion
Table 2. Mean (ⴞ SD) values of the measured and simulated numbers (trees/20 m2) and heights (cm) of trees at plot and stand levels
Simulated At plot level (N ⫽ 22,007) No. of planted spruce (n1) No. of natural pine (n2) No. of natural spruce (n3) No. of natural birch (n4) No. of other broadleaves (n5) Height of crop-tree spruce (h6) Height of broadleaves (h7) At stand level (N ⫽ 1,541) No. of planted spruce (n 1) No. of natural pine (n 2) No. of natural spruce (n 3) No. of natural birch (n 4) No. of other broadleaves (n 5) Height of crop-tree spruce (h 6) Height of broadleaves (h 7)
Measured
CC
RR
Deterministic
2.73 ⫾ 1.5 0.93 ⫾ 2.6 1.18 ⫾ 2.7 4.33 ⫾ 6.0 7.98 ⫾ 6.4 53.0 ⫾ 20.7 119.3 ⫾ 58.5
2.70 ⫾ 1.5 0.65 ⫾ 2.3 1.15 ⫾ 2.9 4.06 ⫾ 5.3 7.34 ⫾ 5.7 52.7 ⫾ 18.9 114.8 ⫾ 53.9
2.69 ⫾ 1.5 0.66 ⫾ 2.3 1.11 ⫾ 2.8 4.15 ⫾ 5.4 7.26 ⫾ 5.7 51.7 ⫾ 18.6 112.9 ⫾ 53.9
2.65 ⫾ 0.2 0.84 ⫾ 0.4 1.21 ⫾ 0.4 4.38 ⫾ 0.8 7.78 ⫾ 0.2 50.2 ⫾ 5.0 110.0 ⫾ 10.0
2.71 ⫾ 0.8 0.97 ⫾ 1.9 1.21 ⫾ 1.7 4.40 ⫾ 4.4 8.08 ⫾ 4.1 53.0 ⫾ 13.5 119.4 ⫾ 44.4
2.69 ⫾ 0.9 0.65 ⫾ 1.8 1.11 ⫾ 2.1 4.09 ⫾ 4.0 7.38 ⫾ 3.8 52.8 ⫾ 12.4 114.8 ⫾ 41.2
2.68 ⫾ 0.9 0.66 ⫾ 1.7 1.10 ⫾ 1.9 4.19 ⫾ 4.2 7.31 ⫾ 3.7 52.0 ⫾ 12.0 113.5 ⫾ 41.0
2.64 ⫾ 0.2 0.85 ⫾ 0.4 1.22 ⫾ 0.4 4.38 ⫾ 0.7 7.78 ⫾ 0.2 48.4 ⫾ 5.2 108.8 ⫾ 10.2
The results of the first simulation are shown for stochastic simulations using random components: CC, cross-correlated at stand and plot levels; RR, uncorrelated at stand and plot levels.
212
Forest Science 54(2) 2008
100
75
75
Percentage (%)
Percentage (%)
100
50 25
50
25
Planted spruce
Seed-origin birch
0
0 2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
0
100
100
75
75
Percentage (%)
Percentage (%)
0
50 25
2
4
50
25
Natural pine 0
6 8 10 12 14 16 18 20+ Number of trees per plot
Other broadleaves 0
0
2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
0
2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
Percentage (%)
100 75
50 25 Natural spruce 0 0
2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
Figure 5. Plot-level cumulative percentages of the measured (bold lines) and simulated number of planted spruces (n1), natural pines (n2), natural spruces (n3), seed-origin birches (n4), and other broadleaves (n5). Simulated values were obtained by deterministic simulations (dotted lines) and by stochastic simulations (thin lines) using cross-correlated random components (mean ⴞ SD of 10 simulations).
of poor regeneration result at the plot level, but underestimated the corresponding proportion at the stand level. Deterministic predictions clearly underestimated the proportion of poorly regenerated plots and stands. For example, because the predicted number of planted spruce varied only slightly around its mean (2.65 trees/plot), only 11.3% of the plots were poorly regenerated. Because of low cross-correlations among the stand-level random effects and plot-level random errors (as well as among response variables), there were no great differences between the predictions obtained using either cross-correlated or uncorrelated random components. When the standlevel random effects were simulated as cross-correlated and the plot-level random errors as uncorrelated (CR simula-
tions in Table 3), the plot-level results were closer to those of the uncorrelated simulations (RR) than to those of the cross-correlated simulations (CC). The incorporation of cross-correlated stand-level random effects induced crosscorrelation also at the plot level, but this was not sufficient to explain plot-level dependencies. As expected, the standlevel results of the CR simulations were close to those of the CC simulations. The effect of the higher cross-correlation between the number of planted spruces and seed-origin birches on the simulated regeneration result was studied by varying the cross-correlation between the random components from ⫺0.9 to 0.9 (Figure 7). The proportion of plots having at least a given number of planted spruces and seed-origin Forest Science 54(2) 2008
213
100
75
75
Percentage (%)
Percentage (%)
100
50 25
50 25
Planted spruce
Seed-origin birch
0
0 2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
0
100
100
75
75
Percentage (%)
Percentage (%)
0
50 25
2
4
50 25
Natural pine 0
6 8 10 12 14 16 18 20+ Number of trees per plot
Other broadleaves 0
0
2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
0
2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
Percentage (%)
100
75 50
25 Natural spruce 0 0
2
4
6 8 10 12 14 16 18 20+ Number of trees per plot
Figure 6. Stand-level cumulative percentages of the measured (bold lines) and simulated number of planted spruces (n1), natural pines (n2), natural spruces (n3), seed-origin birches (n4), and other broadleaves (n5). Simulated values were obtained by deterministic simulations (dotted lines) and by stochastic simulations (thin lines) using cross-correlated random components (mean ⴞ SD of 10 simulations).
birches (Sum2) increased with increasing cross-correlation. The proportion of plots having at least 3 trees/plot (i.e., at least 1,500 trees/ha, which can be used as the minimum density for successful regeneration) was the most sensitive to changes in cross-correlation. It was also found that the same cross-correlation between the simulated responses can be obtained with several different combinations of crosscorrelations between the stand-level random effects and plot-level random errors. In practical forestry, the density and dominant height of deciduous tree species of low economic value (i.e., other broadleaves in the model) are measured to determine the need for and timing of precommercial thinning. If field measurements are not performed, only the deterministic 214
Forest Science 54(2) 2008
predictions can be used to indicate those stands that need cleaning because of competition by broadleaves. However, the stochastic predictions encompass the total variation in establishment of regeneration, which can be used to estimate, for example, the total area and costs of cleaning. In this study, the cleaning costs (costs, euro/ha) were calculated for each measured and simulated stand as follows: costs ⫽ 126.38 ⫹ 0.00312 ⫻ D ⫻ N ⫹ 0.73 ⫻ D2, where D and N, respectively, are the mean diameter at stump height (cm) and number of removed broadleaves (⫽ 500 ⫻ n 5, ha⫺1). In a given stand, the mean diameter of broadleaves at stump height was calculated as a function of dominant height of broadleaves (h 7, cm) as follows: D ⫽ ⫺1.51 ⫹ 0.015 ⫻ max(130, h 7). The costs equation was
Table 3. Proportion (%) of plots and stands having a poor regeneration result (