Title Page (include title; complete author contact information and affiliations; and acknowledgments)
This is the manuscript draft as submitted to Forest Science first. The accepted version Manso, R., Ningre, F. and Fortin, M. 2018. Simultaneous prediction of plot-level and tree-level harvest occurrences with correlated random effects. Forest Science. DOI: 10.1093/forsci/fxy015. will be available on-line at the journal site in due course.
1
2
Simultaneous prediction of plot-level and tree-level harvest occurrences with correlated random effects Rub´en Manso1,2,* , Franc¸ois Ningre1 , and Mathieu Fortin3
3
4
1 INRA,
UMR 1092 LERFoB, 1 rue de l’Arboretum, 54280, Champenoux, France
5
6
2 Forest
Research, Northern Research Station, Roslin, Midlothian, UK, EH25 9SY
7
8
9
3 AgroParisTech,
UMR 1092 LERFoB, 14 rue Girardet, 54042 Nancy, France
* Corresponding
author: e-mail:
[email protected],
10
[email protected]
11
ACKNOWLEDGEMENTS
12
The authors want to thank all the people involved in the measurements of the permanent-
13
plot network and specially to Daniel Ritti´e (LERFoB) who gathered and formatted the data.
14
Special thanks are due to Adam Ash (Forest Research) who thoroughly proofread the original
15
version of the manuscript. The UMR 1092 LERFoB is supported by a grant overseen by the
16
French National Research Agency (ANR) as part of the “Investissements d’Avenir” program
17
(ANR-11-LABX-0002-01, Lab of Excellence ARBRE).
1
Management and Policy Implications
1
Management and Policy Implications
2
The probability that a plot is thinned and the probability that a tree within that plot
3
is harvested are likely to be correlated. Neglecting this correlation would lead to an
4
underestimation of the variance of prediction error in harvest models. As a result, forest
5
managers using growth simulators that include two-level harvest models would have a
6
false idea of precision in stand projections. This may result in wrong decisions in the
7
long run. The proposed method deals with this issue and provides theoretically unbiased
8
variance estimates.
1
Manuscript (Text only; do not embed figures or tables)
1
2
Simultaneous prediction of plot-level and tree-level harvest occurrences with correlated random effects
3
1
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
2
ABSTRACT
4
5
In forestry, harvest models have become popular for forecasting thinning under business-as-
6
usual scenarios. There are two binary processes involved in thinning operations: (i) whether a
7
plot is to be thinned and (ii) whether a particular tree within that plot is to be harvested. These
8
processes can be modeled using logistic regressions. The data used to fit such models come
9
from forest inventories, where the observations are not usually independent. Random effects
10
can be used to deal with these correlations. However, fitting the plot-level and tree-level models
11
independently hinders the estimation of the covariance between the random effects of both
12
models. The objective of this paper was to develop a statistical method for the simultaneous
13
prediction of harvest probabilities at the plot and tree levels in a single mixed-effects model.
14
We developed a maximum likelihood estimator based on the joint distribution of the probability
15
that a given plot is thinned and the probability that a given tree within that plot is harvested.
16
The estimator was derived from a zero-altered binomial form, but it assumed distinct harvest
17
probabilities for each single tree. The estimator was tested in the case study of mixed stands of
18
oak (Quercus spp.) and beech (Fagus sylvatica L.) in Northern France.
19
Keywords: harvest models, zero-altered models, mixed-effects models, simultaneous esti-
20
mation, joint distribution
21
1
INTRODUCTION
22
Forest dynamics simulators require at least a growth module and a mortality module, which
23
are usually implementations of statistical models. When these simulators are intended to make
24
predictions of forest evolution in managed stands, a harvest module is also needed (Fortin 2014).
25
In contrast to growth and mortality, harvest modules have traditionally relied on algorithms that
26
deterministically decide when, where and how to cut according to some management rules (e.g.
27
Pukkala and Miina 1998). The use of these algorithms is conceptually straightforward if we
28
assumed that the realized harvesting strictly follows what was planned. However, for many
29
reasons, the intended management often differs from its practical application. When working
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
3
30
with business-as-usual scenarios statistical harvest models aimed at emulating real management
31
decisions are a competing alternative to traditional algorithms (Antón-Fernández and Astrup
32
2012).
33
Technically speaking, harvest models have to deal with a binary outcome at both the plot
34
and the tree levels (i.e. either a plot is thinned or not; either a tree is cut down or not). Lo-
35
gistic regression is the usual statistical technique to model such response variables (cf. Hosmer
36
and Lemeshow 2000). Therefore, the output of harvest models is the probability that a plot be
37
thinned or that a tree be harvested, depending on the level at which the model applies. Among
38
the few papers on this subject, several studies have strictly focused on forecasting the proba-
39
bilities at the plot level (e.g. Antón-Fernández and Astrup 2012; Melo et al. 2017), sometimes
40
discriminating between different harvest modalities (e.g. Sterba et al. 2000). Fortin (2014) fitted
41
tree-level models using data from plots located in areas with ongoing thinning operations.
42
A more informative method would be the combination of both levels, allowing for thinned
43
and unthinned plots in the training dataset. This implies two independent logistic models, with
44
potentially different covariates in each one of them. These combined models would predict the
45
harvest probabilities for a particular plot and for a particular tree within this plot given that it
46
will be harvested. This two-level alternative was developed in the context of forest dynamics
47
simulators by Thurnher et al. (2011). A similar approach was used by Eastaugh and Hasenauer
48
(2012), who substituted predictions of the proportion of harvested stand volume in thinned plots
49
for tree-level harvest probabilities.
50
Given the hierarchical structure of most forest inventories, correlations between observa-
51
tions from the same plot or measurement year are likely to occur. The consequence of not
52
dealing with these correlations is the underestimation of the variance of the fixed-effects param-
53
eter estimates (Gregoire et al. 1995). The correlation issue has been addressed through copulas
54
in tree-level harvest models (Fortin et al. 2013; Delisle-Boulianne et al. 2014). Mixed-effects
55
modeling is an alternative way to cope with correlations and is the most popular in forest sci-
56
ence. Standard statistical software such as SAS or R allows for the inclusion of random effects
57
in logistic models and could be easily used to improve the existing two-level harvest models
58
as well. However, a major challenge remains: the covariance between the random effects in-
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
4
59
cluded in both models cannot be ascertained. This could only be achieved by simultaneously
60
estimating the parameters from both models.
61
One way to do this is to define the joint distribution of the probability of a plot-level harvest
62
and the probabilities of individual-tree harvests conditional on harvest occurrence at plot level.
63
Depicting this joint probability was the main objective of the present study. The approach was
64
inspired by the mathematical description of the so-called zero-inflated distributions, i.e distri-
65
butions holding more zeros than those expected at random (cf. Lambert 1992). Zero-inflated
66
distributions are basically defined through the coupling of a Bernoulli distribution, which yields
67
the probability of observing a zero event, with another distribution that yields the probability
68
of observing non-zero events. This can also be interpreted as an event involving two processes
69
where one is conditional to the other. In the current context, harvesting a particular plot can be
70
seen as a first Bernoulli process, while harvesting a tree within this plot can be interpreted as
71
another Bernoulli process conditional on the first one.
72
In order to illustrate the approach, we chose the case study of even-aged mixed stands of
73
oak (Quercus spp.) and beech (Fagus sylvatica L.) in Northern France. Specifically, we used
74
a network of permanent plots that was set up to investigate different thinning schemes for both
75
species. A two-level harvest model with mixed effects was fitted to these data and random
76
effects at the plot and tree levels were allowed to be correlated.
2
77
78
2.1
MATERIAL AND METHODS
DATA
79
The data used in the present study were selected from a series of historical and ongoing sil-
80
vicultural experiments monitored by the Laboratoire d’Etude de Ressources Forêt-Bois (LER-
81
FoB) and that were set up between 1883 and 1956. This compilation constitutes the LERFoB
82
permanent-plot network, which is primarily aimed at analysing the effect of management on
83
growth and mortality patterns in even-aged sessile oak (Quercus petraea (Matt.) Liebl.) and
84
European beech stands across Northern France. In this respect, low thinnings were applied with
85
the objective of keeping the stand density below a given value. All control plots (i.e those never
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
5
86
thinned) were discarded for the purposes of this study. Given the heterogeneous data origin,
87
diameter distribution, plot size and measurement intervals notably changed from one site to an-
88
other. This information is provided in Table 1 together with the proportion of occasions where
89
thinnings took place.
90
(Insert Table 1 here)
91
The data we used comprised over 175 000 records of diameter at breast height (DBH, 1.3 m)
92
from some 33 000 trees within 75 plots in Northern France (Fig. 1) measured between 1922
93
and 2012. For each DBH measurement tree status was also recorded (alive, dead or recently
94
harvested; see Table 2 for a summary of harvested trees’ proportions). While beech and/or oak
95
predominate, other species were also present of which European hornbeam (Carpinus betulus
96
L.) was the most common among them.
97
(Insert Fig. 1)
98
(Insert Table 2 here)
99
2.2
S TATISTICAL DEVELOPMENTS
100
The probability that a harvest takes place in a plot and that some trees in this plot are then
101
harvested can be thought of as a joint probability distribution derived from many univariate
102
Bernoulli distributions. Let us assign indices i, j and k to the plot, the tree and the measure-
103
ment year, respectively. Plot-level harvest occurrence can be seen as a Bernoulli process with
104
outcome qik = 1 and Pr(qik = 1) = λik if plot i is harvested at year k. Otherwise, qik = 0 and
105
Pr(qik = 0) = 1 − λik . The probability mass function f (qik ; λik ) of plot-level harvest occurrence
106
is then
q
f (qik ; λik ) = λikik (1 − λik )1−qik 107
(1)
Similarly, the harvest of individual trees is also a Bernoulli process with outcome ui jk = 1
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
6
108
when tree j is felled down in plot i at year k, or ui jk = 0 otherwise. As the harvest of a tree
109
is conditional on the harvest occurrence at plot level, the probability that ui jk = 1 needs to be
110
accordingly defined as Pr(ui jk = 1 | qik = 1) = πi jk . Likewise, the probability of a tree being
111
left standing after a thinning operation is Pr(ui jk = 0 | qik = 1) = 1 − πi jk . The joint probability
112
mass function of observing qik and u i·k = (ui1k , ui2k , . . . , uinik k )T conditional on qik = 1 is then u
Pr(qik , u i·k ) = (1 − λik )1−qik λik
nik πi jki jk (1 − πi jk )1−ui jk ∏ j=1
!qik
ik 1 − ∏nj=1 (1 − πi jk )
(2)
113
ik The denominator 1 − ∏nj=1 (1 − πi jk ) in Eq. 2 is the probability that at least one tree is
114
harvested. This truncation in zero is required since a plot-level harvest occurrence cannot result
115
in no harvested tree by definition.
116
If parameters λik and πi jk are modeled as functions of some covariates, then it is possible to
117
link the harvest probabilities to management, site conditions or other factors thought to influence
118
the occurrence of harvest at plot and tree levels. In order to link λik and πi jk to a set of covariates
119
(xxik , z i jk ), we used the logit link function (McCullagh and Nelder 1989), so that
ex ik β 1 + ex ik β ez i jk γ πi jk = 1 + ez i jk γ λik =
120
121
(3) (4)
where β and γ are vectors of parameters. In the end, a model likelihood can be expressed on the basis of Eq. 2:
β , γ | q , u , X, Z) = ∏ ∏ Pr(qik , u i·k | β , γ , x ik , Zi·k )) L(β i
(5)
k
122
where matrix Zi·k has its rows equal to the z i jk .
123
Due to the hierarchical structure of data the observations within the same plot and year
124
may not be independent. Random effects can be specified in the model in order to relax the
125
assumption of independence. In order to simplify the derivation, we will focus on plot random
126
effects only, but the developments can be extended to year random effects as well.
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
127
Random effects were included in Eqs. 3 and 4 so that
λik (bi,1 ) =
ex ik β +bi,1
1 + ex ik β +bi,1 ez i jk γ +bi,2 πi jk (bi,2 ) = 1 + ez i jk γ +bi,2 128
129
7
(6) (7)
where b i = (bi,1 , bi,2 )T is a vector of plot random effects, which is assumed to follow a bivariate normal distribution, such that b i ∼ N2 (00, G).
130
Adapting the Eq. 5 accordingly leads to a conditional likelihood that depends on the unob-
131
served random effects b i . Parameter estimation relies on this conditional likelihood marginal-
132
ized over the distribution of the random effects (Pinheiro and Bates 2000, p.62)
β , γ , G, | q , u , X, Z) = ∏ L(β i
Z
∏ Pr(qik , u i·k | β , γ , x ik , Zi·k , b i) pdf(bbi, G)dbbi
(8)
k
133
where pdf(bbi , G) is the density of the bivariate normal distribution with mean 0 and variance-
134
covariance G. This complex likelihood can be maximized using the PROC NLMIXED proce-
135
dure available in SAS (SAS Institute Inc. 2008). An example of the code we used in this study
136
is shown in Appendix A.
137
138
Matrix G contains the variances of bi,1 and bi,2 on its diagonal whereas the off-diagonal elements consist of the covariance between bi,1 and bi,2 : G=
σ12 σ1,2
σ1,2 σ22
(9)
139
It can be reasonably assumed that the covariance between random effects bi,1 and bi,2 is
140
non null, which means that the harvest probabilities at plot and tree levels would be somehow
141
correlated. This covariance parameter σ1,2 can be estimated together with the rest of parameters
142
through the maximization of likelihood in Eq. 8. In contrast, if we assume that σ1,2 = 0, then the
143
model would be equivalent to fitting a plot-level and a tree-level model independently from each
144
other. We will refer to this last model as the constrained model as opposed to the unconstrained
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
145
model in which the covariance is allowed to be non null.
146
2.3
8
M ODEL SPECIFICATIONS AND EVALUATION
147
Different variables were tested sequentially in vectors x ik and z i jk . Because our dataset in-
148
cluded mixed stands, a species effect was specified in interaction with the other covariates. This
149
species effect had four classes: “oak”, “beech”, “hornbeam” while all other marginal species
150
were grouped into a class called “others”.
151
Preliminary fits were carried out on a model without random effects and a visual check of
152
the Pearson residuals (Hosmer and Lemeshow 2000, p.155) at the plot and the tree level served
153
to assess a suitable combination of variables in x ik and z i jk . These preliminary tests showed
154
that πi jk exhibited a complex pattern with respect to the relative DBHi jk , which was calculated
155
as rDBHi jk = DBHi jk /MQDik , where MQDik is the mean quadratic diameter of plot i at year
156
k. A segmented regression was then used to accommodate this pattern. Two segments were
157
required, with rDBHi jk = 0.9 being set as the heuristic joint between them. At the plot level the
158
time elapsed since the last harvest (LASTCUTik ) seemed to be the covariate that better predict
159
the probability of harvest. For the early measurements the time since the last harvest was either
160
unobserved because the plots had never been harvested or the last harvest was carried out before
161
the beginning of the monitoring. In these cases, plot basal area (BAik ) was used instead.
162
The preliminary model structure was
x ik β = β0 + β1 BAik rik + β2 LASTCUTik (1 − rik )
(10)
z i jk γ = γ0 + (γ1,s + γ2,s mi jk )di jk + (γ3 + γ4 (1 − mi jk ))di2jk
(11)
163
where ri jk is a dummy variable that takes the value 1 if the time elapsed since the last harvest
164
is unknown or 0 otherwise; di jk = rDBHi jk − 0.9 is a transformation that allows both segments
165
to converge to the same value at the joint; mi jk is another dummy variable that equals 1 if
166
di jk ≥ 0, and 0 otherwise; s stands for the species index.
167
The aforementioned Pearson residuals were computed over regular intervals of a particular
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
9
168
covariate to identify any lack of fit. For each interval g of this covariate, we defined a subset
169
of observed events as well as a their corresponding predicted probabilities. If y¯g and y¯ˆg are
170
the means of the observations and the predictions, respectively, then the Pearson residual for
171
interval g (resg ) can be calculated as y¯g − y¯ˆg resg = q y¯ˆg (1 − y¯ˆg )/ng
172
173
(12)
where ng is the number of observations in interval g. As a result of the joint probability used in Eq. 2, the observations were qik · ui jk whereas the predictions were calculated as λˆ ik · πˆi jk .
174
Once the covariates were specified the model was re-fitted with plot and year random ef-
175
fects. Two fits were carried out: a first one where random effects were allowed to correlate
176
(unconstrained model) and a second one with independent random effects (constrained model).
177
A major issue with the predictions of this kind of mixed-effects models is that the predictions
178
with the random effects set to 0 are not population-averaged predictions (McCulloch et al. 2008,
179
p.190). One method for obtaining population-averaged predictions is to integrate the predictions
180
which are conditional on the random effects over the distribution of the random effects, just like
181
what is done for the likelihood function 8. This integral has no closed-form solution and can be
182
easily approximated using Gauss-Hermite quadrature (Fortin 2013).
183
Pearson residuals were computed based on approximated population-averaged predictions
184
and their pattern with respect to the different covariates were checked again. The goodness-of-
185
fit of the resulting models was assessed using Akaike’s Information Criterion (AIC).
186
The unconstrained model was also evaluated based on the Hosmer-Lemeshow test (Hosmer
187
and Lemeshow 2000). The test relies on a statistic that is computed similarly to the aforemen-
188
tioned Pearson residuals. Predictions of the joint probability that a plot be harvested and that
189
a particular tree in this plot be harvested were ranked and grouped according to predictions
190
deciles. For each group the squared differences between the mean value of predictions and ob-
191
servations were standardized and summed. The Hosmer-Lemeshow statistic is asymptotically
192
distributed as a χ 2 with 8 degrees of freedom under the null hypothesis that observed values
193
do not significantly differ from the predictions (Hosmer and Lemeshow 2000). All predictions
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
10
194
were pseudo-independent and population-averaged. This was achieved through a 10-fold cross-
195
validation and the marginalisation of the resulting predictions.
3
196
RESULTS
197
When simultaneously tested in both parts of the unconstrained model only the year random
198
effects improved the model fit with an AIC of 98 345 compared with 103 041 for a model with-
199
out random effects. The resulting parameter estimates are shown in Table 3. The unconstrained
200
model proved only slightly better than the constrained one (AIC 98 353).
201
(Insert Table 3 here)
202
Although the Pearson residuals did not show any evidence of lack of fit when the uncon-
203
strained model was being fitted, the Hosmer-Lemeshow statistic revealed a significant lack of
204
agreement between observations and predictions (χ82 = 1963.99; p-value < 0.0001). Graphi-
205
cal comparison of ranked observations and predictions suggested that this divergence could be
206
related to an underestimation of the events within the 10th decile (Fig. 2). Concordance was
207
evident for all other groups.
208
(Insert Fig. 2 here)
210
Based on the estimated elements of matrix G, the correlation between the random effects q c2 σ c2 . In our case study, a positive correlation of could be roughly estimated as ρˆ = σˆ 1,2 / σ 1 2
211
0.66 was found. This means that if the probability that a plot is harvested is higher than the
212
average of the population for a given year, the individual-tree probabilities would also be larger
213
than those of the population.
209
214
In Figs. 3 and 4 we graphically illustrate how the different covariates affect the probabilities
215
that a plot or a tree is harvested. The ranges of the covariates were set according to the values
216
found in the dataset. At the tree level, we chose two different types of plots, a plot in a young
217
stand and a plot in a mature stand. The mean quadratic diameter in these two plots was assumed
218
to be 20 cm and 50 cm for the young and the mature stands, respectively.
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
219
11
(Insert Fig. 3 here)
220
The plot-level harvest probability increased from 0.45 to 0.70 as the time since the last har-
221
vest increased from 1 to 10 years (Fig. 3a). When the date of the last harvest was unknown
222
the predicted probability smoothly increased along with plot basal area and the predicted prob-
223
abilities ranged from approximately 0.6 to 0.7 for basal areas of 18 to 40 m2 ha−1 , respectively
224
(Fig. 3b).
225
The effect of rDBH on the tree-level harvest probabilities followed a similar pattern in what
226
we defined as a mature and a young stand (Fig. 4). Oak trees with small rDBH, i.e. when
227
DBH is smaller than the mean quadratic diameter, were more intensely harvested than beech
228
trees. For a rDBH of 0.50, the predicted probabilities of oak were approximately twice those of
229
beech in both stands. Larger trees of both species were seldom harvested with probabilities of
230
0.10 when the rDBH was close to 1.5. It was only when DBH was twice as large as plot mean
231
quadratic diameter, that the predicted probabilities increased for beech trees, with values close
232
to 0.25. In contrast, oak individuals did not seem to be harvested in practice in these situations.
233
Such large trees with rDBH close to 2 only occurred in the young stand.
234
(Insert Fig. 4 here)
4
235
DISCUSSION
236
The present study sets the grounds for simultaneous parameter estimation in two-level har-
237
vest models. The statistical developments needed for this implementation led to a relevant
238
progress in the current state-of-the-art of zero-inflated models. However, the main achievement
239
of our approach stems from the fact that the full variance-covariance matrix of the random ef-
240
fects can be estimated when random effects are present in both the plot-level and the tree-level
241
parts of the model.
242
Using zero-inflated mixed models with correlated random effects is not new in forestry.
243
Calama et al. (2011) fitted a zero-inflated log-normal model to forecast the weight of cones
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
12
244
produced by a pine species whereas Manso et al. (2014) worked with a zero-inflated bino-
245
mial model to predict seed predation and in both cases the random effect correlation was taken
246
into account. In contrast, two-level harvest models had never been analysed under a zero-
247
inflated/altered framework. As a consequence a simultaneous parameter estimation was not
248
possible and in turn the correlation between the random effects could not be considered. The
249
fact that the likelihood formulation of two-level harvest models with mixed effects is more
250
complicated than those of the previous examples may have prevented modelers from testing
251
this approach.
252
In forestry zero-inflated and zero-altered models have been aimed at modeling various re-
253
sponses such as recruitment (Fortin and DeBlois 2007; Ledo et al. 2015), seed predation (Manso
254
et al. 2014), cone production (Calama et al. 2011), wildfire damage (Guo et al. 2016) or LiDAR-
255
based tree detection (Korhonen et al. 2016). One application that falls conceptually closer to
256
harvesting is that of tree mortality modeling. In this respect some authors have proposed the
257
Poisson distribution as the underlying process driving mortality in a given plot providing that
258
mortality takes place at all so that the number of dead trees in such a context can be represented
259
as a count variable (Affleck 2006).
260
A weakness of the Poisson distribution is that it is has no upper bound whereas there is
261
only a finite number of individuals that can die in a given plot. In order to deal with this issue
262
an offset variable, that is also called the exposure, can be specified (McCullagh and Nelder
263
1989, p.206). As a result, model predictions can be interpreted as a proportion. A more natural
264
approach would be to treat the conditional process as a binary response as well, which would
265
lead to a zero-inflated/altered binomial (e.g. Hall 2000)). The binomial distribution therefore
266
provides the expectation that a tree dies, providing that all trees share the same probability.
267
The same rationale could apply to achieve simultaneous parameter estimation in two-level
268
harvest models as follows: given that a plot is going to be harvested, the number of harvested
269
trees could be assumed Poisson- or binomial-distributed. A major limitation of either approach
270
is that the same probability is assumed for every single tree, as mentioned before. This assump-
271
tion is not valid as shown in previous works on the harvesting of individual trees (Thurnher
272
et al. 2011; Fortin et al. 2013; Delisle-Boulianne et al. 2014; Fortin 2014). In partial harvesting
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
13
273
each tree has a distinctive probability of being harvested that depends on a set of covariates.
274
The Poisson approach does not allow for an individual probability assignment whilst the bino-
275
mial distribution can be adapted. Given that a binomial process is simply the repetition of a
276
Bernoulli trial, the binomial distribution that represents the harvesting of the individual trees
277
can be derived from the individual probabilities. This concept has been defined as a binomial
278
distribution with unequal probabilities (see Friedman 1984).
279
The distinction between zero-inflated and zero-altered models lies in the truncation of zero
280
in the second distribution that composes the joint distribution (Zeileis et al. 2008). This trunca-
281
tion implies that the zeros cannot be observed if a harvest is carried out. This was true in our
282
case study because we modeled the plot-level occurrence and the plots were large in area. If the
283
plots are small and the focus is on the stand-level occurrence, it may happen that no trees are
284
harvested in a given plot simply because it was too small. In such a case a pure zero-inflated
285
formulation would be more appropriate.
286
The sign of the correlation between the random effects of the plot-level and the tree-level
287
parts in a zero-inflated model like that applied in our case study can be interpreted. A positive
288
correlation implies that the random effects tend to be both positive or negative. If the random
289
effects are positive then the plot has a higher probability of being harvested and if it is, more
290
trees will be harvested within that plot. Negative random effects indicate that a plot has a lower
291
probability of being harvested and even if it is, fewer trees will be harvested. In practice, the
292
main implication of a positive correlation is that the deviation of predictions with respect to
293
the mean would be larger than what is predicted when the random effects are uncorrelated.
294
Under the assumption of independent random effects, probabilities higher than the mean at the
295
plot level may be compensated by lower probabilities at the tree level and vice versa, which
296
would eventually lead to underestimating the true variance of the prediction errors. It is also
297
well known that neglecting random effects leads to an underestimation of the variance of the
298
estimated of the fixed effects Gregoire et al. (1995), which may result in an additional impact
299
on the uncertainty of predictions.
300
The method presented in this study was developed to account for the effects of correla-
301
tions between random effects in uncertainty assessment, but it was not possible to quantify the
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
14
302
implications of ignoring them. The unconstrained model did not markedly outperformed the
303
constrained one in terms of AIC, making further comparisons via simulation exercises redun-
304
dant. Nevertheless, our approach makes it possible to test this possibility and to compare the
305
model with simpler alternatives, which may prove useful in other cases.
306
Concerning covariates, the variable with the most dominant effect on the probability of
307
harvest at the plot level was the time elapsed since the last harvest; however, this variable may
308
be unknown in cases where harvesting has not yet occurred or took place prior to monitoring.
309
Such gaps in data can be resolved by using dummy variables and we managed to use the plot
310
basal area in place of the time since the last harvest when the latter was unknown. In preliminary
311
trials we assumed that basal area would be the major driver of harvest occurrence and that the
312
time elapsed since the last harvest would not be needed, however the preliminary model resulted
313
in a poor fit. The significant effect of the time-since-last-harvest on predictions of plot-harvest-
314
occurence can be explained by the management of these stands. Once the stem exclusion stage
315
as defined by Oliver and Larson (1996) is over, these stands are usually thinned on a more or
316
less regular basis and the density is not a major criterion in the management (see ONF 2007;
317
Sardin 2008).
318
At the tree level the relative diameter and species group were the two variables that were
319
found to have a significant effect on the harvest occurrence. Tree diameter and species were also
320
the most important covariates used in other studies (Fortin 2014; Thurnher et al. 2011). In this
321
respect, we found lower probabilities for larger stems in contrast to the findings of other studies
322
(e.g. Fortin 2014). Management differences may explain this divergence. In Fortin (2014) the
323
stands were mainly uneven-aged and managed using selection cutting while in our case the
324
stands are even-aged and low thinnings are applied. Therefore it is not surprising to find higher
325
probabilities of harvest in smaller stems (Fig. 4). The different species-specific patterns found
326
in the present study are related to species’ ecology. For instance, the fact that oak is harvested
327
more than beech is related to its tolerance to shade; beech is very shade-tolerant whereas oak is
328
intermediate. Suppressed oak trees have a low probability of survival which has been evidenced
329
in Manso et al. (2015). In this context, forest managers prefer to harvest them while they are
330
still alive and valuable.
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
5
331
15
CONCLUSIONS
332
The covariance of random effects between plot-level and tree-level harvest models cannot
333
be estimated if the models are fitted independently. In the present study an approach inspired by
334
zero-inflated modeling was developed which simultaneously estimates parameters at both plot
335
and tree levels and subsequently the covariance of random effects. The method was applied in
336
a case study and compared to a classical approach of independently estimating parameters. The
337
method presented did not clearly outperform the existing approach; however we believe that it
338
is still a relevant contribution to harvest modeling, given the potential implications of neglecting
339
random effect covariance on uncertainty assessment.
340
R EFERENCES
341
Affleck, D. L. (2006). Poisson mixture models for regression analysis of stand-level mortality.
342
Canadian Journal of Forest Research, 36(11):2994–3006.
343
Antón-Fernández, C. and Astrup, R. (2012). Empirical harvest models and their use in regional
344
business-as-usual scenarios of timber supply and carbon stock development. Scandinavian
345
Journal of Forest Research, 27(4):379–392.
346
Calama, R., Mutke, S., Tomé, J., Gordo, J., Montero, G., and Tomé, M. (2011). Modelling
347
spatial and temporal variability in a zero-inflated variable: The case of stone pine (Pinus
348
pinea L.) cone production. Ecological Modelling, 222(3):606–618.
349
Delisle-Boulianne, S., Fortin, M., Achim, A., and Pothier, D. (2014). Modelling stem selection
350
in northern hardwood stands: assessing the effects of tree vigour and spatial correlations
351
using a copula approach. Forestry, 87:607–617.
352
353
Eastaugh, C. S. and Hasenauer, H. (2012). A statistical thinning model for initialising largescale ecosystem models. Scandinavian Journal of Forest Research, 27(6):567–577.
354
Fortin, M. (2013). Population-averaged predictions with generalized linear mixed-effects mod-
355
els in forestry: an estimator based on Gauss-Hermite quadrature. Canadian Journal of Forest
356
Research, 43:129–138.
357
Fortin, M. (2014). Using a segmented logistic model to predict trees to be harvested in forest
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
358
16
growth forecasts. Forest Systems, 23(1):139.
359
Fortin, M. and DeBlois, J. (2007). Modeling tree recruitment with zero-inflated models: The
360
example of hardwood stands in southern Quebec, Canada. Forest Science, 53(4):529–539.
361
Fortin, M., Delisle-Boulianne, S., and Pothier, D. (2013). Considering spatial correlations
362
between binary response variables in forestry: an example applied to tree harvest modeling.
363
Forestry, 59(3):253–260.
364
365
Friedman, M. F. (1984). On the extended binomial distribution. Computers & Operations Research, 11(3):241–243.
366
Gregoire, T., Schabenberger, O., and Barrett, J. (1995). Linear modelling of irregularly spaced,
367
unbalanced, longitudinal data from permanent-plot measurements. Canadian Journal of For-
368
est Research, 25:137–156.
369
Guo, F., Wang, G., Innes, J., Ma, Z., Liu, A., and Lin, Y. (2016). Comparison of six generalized
370
linear models for occurrence of lightning-induced fires in northern Daxing’an Mountains,
371
China. Journal of Forestry Research, 27(2):379–388.
372
373
374
375
Hall, D. (2000). Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics, 56:1030–1039. Hosmer, D. J. and Lemeshow, S. (2000). Applied logistic regression. John Wiley & Sons, New York, 2nd edition.
376
Korhonen, L., Salas, C., Østgård, T., Lien, V., Gobakken, T., and Næsset, E. (2016). Predicting
377
the occurrence of large-diameter trees using airborne laser scanning. Canadian Journal of
378
Forest Research, 46(4):461–469.
379
380
Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34(1):1–14.
381
Ledo, A., Cayuela, L., Manso, R., and Condés, S. (2015). Recruitment of woody plants in a
382
cloud forest: a combination of spatial mechanisms. Journal of Vegetation Science, 26(5):876–
383
888.
384
Manso, R., Morneau, F., Ningre, F., and Fortin, M. (2015). Incorporating stochasticity from
385
extreme climatic events and multi-species competition relationships into single-tree mortality
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
386
387
388
389
390
391
392
17
models. Forest Ecology and Management, 354:243–253. Manso, R., Pardos, M., and Calama, R. (2014). Climatic factors control rodent seed predation in Pinus pinea L. stands in Central Spain. Annals of Forest Science, 71(8):873–883. McCullagh, P. and Nelder, J. A. (1989). Generalized linear models. Monographs of Statistics and Applied Probability 37. Chapman&Hall, New York, 2 edition. McCulloch, C., Searle, S., and Neuhaus, J. M. (2008). Generalized, linear, and mixed models. John Wiley & Sons, New York.
393
Melo, L. C., Schneider, R., Manso, R., Saucier, J.-P., and Fortin, M. (2017). Using survival
394
analysis to predict the harvesting of forest stands in Quebec, Canada. Canadian Journal of
395
Forest Research, 0:accepted.
396
397
398
399
Oliver, C. D. and Larson, B. C. (1996). Forest Stand Dynamics. Updated Edition. John Wiley and Sons, New York, USA. ONF (2007). Gestion des hêtraies dans les forêts publiques françaises. Office National des Forêts.
400
Pinheiro, J. and Bates, D. (2000). Mixed effects models in S and S-PLUS. Springer, New York.
401
Pukkala, T. and Miina, J. (1998). Tree-selection algorithms for optimizing thinning using a
402
distance-dependent growth model. Canadian Journal of Forest Research, 28:693–702.
403
Sardin, T. (2008). Chênaies continentales. Guide des sylvicultures. Office National des Forêts.
404
SAS Institute Inc. (2008). SAS/STAT 9.2 User’s Guide. SAS Institute Inc., Cary, NC.
405
Sterba, H., Golser, M., Moser, M., and Schadauer, K. (2000). A timber harvesting model for
406
407
408
409
410
Austria. Computers and Electronics in Agriculture, 28(2):133–149. Thurnher, C., Klopf, M., and Hasenauer, H. (2011). Forests in transition: a harvesting model for uneven-aged mixed species forests in Austria. Forestry, 84(5):517–526. Zeileis, A., Kleiber, C., and Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8).
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
411
A
18
IMPLEMENTATION OF THE MODEL IN SAS
412
The next code reproduces the necessary elements to fit the two-level harvest model intro-
413
duced in the present paper under the unconstrained assumption for the variance-covariance ma-
414
trix of the random effects.
415
/*******************************************************************/
416
proc nlmixed data=... ;
417
parms
418
g0 = , ... , g3 = /* Initial values of parameters, plot-level */
419
b0 = , ... , b10 = /* Initial values of parameters, tree-level */
420
su2 = , sv2 = , suv = ; /* Initial values variance random effects */
421
/* arrays for the different covariates. 1433 is the maximum number of
422
observations in a plot in the present example */
423
array ssd [1433] ssd1-ssd1433; /* scaled mean square diameter */
424
array het [1433] bee1-bee1433; /* dummy species = beech */
425
array chx [1433] oak1-oak1433; /* dummy species = oak */
426
array cha [1433] hor1-hor1433; /* dummy species = hornbean */
427
array aut [1433] oth1-oth1433; /* dummy species = others */
428
array threshold [1433] thr1-thr1433; /* dummy dbh > threshold */
429
array cut[1433] cut1-cut1433; /* dummy cut tree */
430
allcutProb = 1;
431
likTreeLevel = 0;
432
do i = 1 to N; /* N is the number of trees in each plot */
433
marginalLinearTermsTree =
b0 +
434
(b1*bee[i] + b2*oak[i] + b3*hor[i] + b4*oth[i] +
435
(b5*bee[i] + b6*oak[i] + b7*hor[i] + b8*oth[i])*thr[i])*ssd[i] +
436
(b9 + 10*thr[i])*(ssd[i]**2);
437
linearTermsTree =
marginalLinearTermsTree + u;
438
pi = exp(linearTermsTree)/(1 + exp(linearTermsTree));
439
allcutProb = allcutProb*(1-pi);
440
likTreeLevel = likTreeLevel + cut[i]*log(pi) + (1-cut[i])*log(1-pi);
P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES
441
end;
442
marginalLinearTermsPlot = notCutYet*(g0 + g1*ST) +
19
443
alreadyCut*(g2 + g3*timeSinceLastCut);
444
linearTermsPlot = marginalLinearTermsPlot + v;
445
p = exp(linearTermsPlot)/(1 + exp(linearTermsPlot));
446
logLik = (1-cutPlot)*log(1 - p) + cutPlot*log(p) + cutPlot*likTreeLevel cutPlot*log(1 - allcutProb); /* cutPlot: dummy thinned plot */
447
448
model cutPlot $\sim$ general(logLik);
449
random u v $\sim$ normal([0, 0], [su2, suv, sv2]) subject= year;
450
run;
451
/*******************************************************************/
Table
1
Tables
1
2
Experimental site All´ ee de Blois Bois Brochet Beaulieu Butte de Tir Camp Souverain Camp Cusson Charlemagne Chatellier Chavigny Chr´ etiennette Cl´ es des Foss´ es Cˆ otes aux Hˆ etreaux Croix de Saverne Ducellier Epic´ eas Faˆıte Grande Bouzule Grand Pierrier Hallet Hermousset Lacharmaie Fontaine aux Ordons Launay-Morel M. des Cordeliers Morat Mortefert Pauverts Plantonn´ ee Plˆ o du Poteau Pr´ e des Seigneurs Puiseux-en-Retz Rennweg Carr´ e latin de R´ eno Richebou rg Sablonni` eres Sainte Marie Tr´ esor” Verbamont
Initial date 1927 1931 1923 1959 1927 1931 1923 1934 1923 1922 1931 1927 1946 1945 1923 1922 1928 1923 1951 1934 1925 1923 1945 1928 1931 1922 1928 1981 1931 1922 1922 1951 1989 1931 1966 1958 1959 1923
Observations 2946 6577 3032 5821 769 14501 2606 5808 2209 2366 2326 1260 945 1909 3260 12671 19860 1128 13575 9657 589 1797 330 9024 798 569 14743 589 639 1167 3035 1388 10354 2100 8521 1310 5478 766
DBH (cm) 20.4-85.3 10.5-85 9.5-50.6 7.6-53.2 25.1-85.3 7.3-90.7 6.4-80.5 7-85.3 6.4-68.8 7-73.8 14.3-96.1 19.7-87.5 6.7-58.3 15.6-75.8 9.2-75.1 4.5-61.8 6-70.3 8-66.2 6-65.9 5.7-81.8 35.7-77.0 13.4-80.2 30.9-144.2 14.6-87.2 14-106 39.2-106 6-74.2 6-43.6 25.5-76.7 29.6-95.5 6.4-78.9 9.9-44.2 5.7-54.7 17.5-112.7 7.0-65.3 3.8-41.7 6-69.4 17.8-88.8
Basal area (m2 ha−1 ) 23.8-37.4 20.8-29.7 21.2-27.2 19.6-34.7 25.9-36.6 20.0-43.1 29.7-47.6 22.2-37.2 31.9-45.2 18.2-39.6 30.4-38.7 30.2-37 28.6-30.5 16.6-26.1 21-44.2 10.1-33.6 19.1-36.7 31.2-39 15.3-35.5 19.8-34.6 2.0-23.6 27.9-33.4 6.8-28.1 21.8-35.8 15.6-46.3 15.6-31.6 20.1-33.9 4.3-23.4 5.8-21.5 34.7-44.4 21-31.3 25.5-29.1 16.3-34.1 30.0-34.6 10.9-35.4 22.6-23.5 23.6-31.9 9.2-21.8
Density (stems ha−1 ) 145-304 94-602 352-673 168-1102 138-217 81-813 156-988 96-611 155-1115 96-531 98-483 163-275 277-335 91-283 160-776 165-2155 147-1223 156-1068 205-1162 122-1433 7-126 145-309 20-89 92-567 34-137 35-107 119-1242 84-376 28-156 107-203 94-1572 684-728 295-1010 88-212 81863 606-740 199-878 58-172 Interval length (years) 1-10 1-10 2-8 1-9 1-5 1-7 2-10 3-8 2-10 3-10 1-10 2-9 5 1-10 2-10 2-10 1-9 1-10 2-10 3-10 2-6 2-8 4-6 1-9 5-7 3-7 1-10 1-10 1-6 3-10 3-10 5 1-5 2-8 1-10 4 4-10 1-8
Plot size (ha) 1 1 1 0.5 1 1 0.2 1 0.2 0.3 1 1 1 1 0.2 0.2 1 0.2 1 1 2 1 2 1 2 1 1 0.5 0.9 1 0.5 0.5 0.2 2 0.8 1 1 1
Prop. intervals with cuts 0.57 0.67 0.67 0.5 0.4 0.61 0.82 0.6 0.62 0.62 0.73 0.83 0.33 0.7 0.53 0.71 0.58 0.7 0.79 0.89 0.8 0.75 1 0.67 0.8 0.57 0.58 0.4 0.4 0.57 0.7 0.75 0.62 0.71 0.62 1 1 0.5
Table 1: Summary of the dataset by experiment including the range of the main variables for the considered measurement periods.
Table 2: Frequencies of the proportion of cut trees (all, beech, oak) in the years and plots where thinnigs took place. Proportions are set over the total and per species (when present in the plot). proportion 0-0.05 0.05-0.15 0.15-0.25 0.25-0.35 0.35-0.45 0.45-0.55 0.55-0.65 0.65-0.75 0.75-0.85 0.85-0.95 0.95-1
all species/total 93 119 138 62 20 3 3 1 1 0 1
beech/total beech 50 74 78 47 12 7 3 6 2 2 12
3
beech/total 121 72 59 27 5 3 2 0 0 0 4
oak/total oak 45 66 99 39 16 6 3 3 3 2 16
oak/total 108 71 77 29 7 1 1 0 1 0 1
Table 3: Maximum likelihood parameters estimates and standard errors of the simulta2 2 neous plot and tree thinning model. σyear,b and σyear,b stand for the variance of the year 1 2 random effects for the plot and tree models, while σyear,b1 ,b2 is the covariance between them. Parameter β0 β1 β2 γ0 γ1,beech γ1,oak γ1,hornbeam γ1,others γ2,beech γ2,oak γ2,hornbeam γ2,others γ3 γ4 2 σyear,b 1 2 σyear,b 2 σyear,b1 ,b2
estimate -0.3694 0.0387 0.1760 -2.1306 -4.0513 -5.3613 -0.9210 -2.3803 -1.2513 -3.7132 -2.0291 -0.1231 4.0450 -11.1102 1.5807 1.5076 1.0155
4
standard error 0.2862 0.0135 0.0368 0.1634 0.1587 0.1508 0.5513 0.7612 0.3242 0.3038 0.6480 1.5476 0.1989 0.4614 -
Figure
Figure
Figure
Figure
Figure Captions
1
Figure Captions
2
• Figure 1. Location of sample plots
3
• Figure 2. Predicted (circles) and observed (triangles) numbers of events by decile
4
in the Hosmer-Lemeshow test
5
• Effect of the time since the last harvest (a) or the basal area (b) when this time is
6
unknown on the plot-level probabilities of harvest. The 0.95 confidence envelopes
7
are represented in gray
8
• Figure 4. Effect of the relative DBH, i.e the ratio between tree dbh and plot
9
mean quadratic diameter, on the conditional probabilities that a tree is harvested
10
in a mature stand (a) and a young stand (b). The 0.95 confidence envelopes are
11
represented in gray
1